Writing

Core ML and On-Device AI for iOS Enterprise Apps: The Complete Guide for US Companies 2026

The iPhone 15 Pro Neural Engine runs at 35 TOPS. Core ML routes inference through it automatically. Here is what enterprise iOS apps can do with on-device AI today.

Praveen KumarPraveen Kumar · Technical Lead, Wednesday Solutions
9 min read·Published Apr 24, 2026·Updated Apr 24, 2026
0xfaster with AI
0xfewer crashes
0xmore work, same cost
4.8on Clutch
Trusted by teams atAmerican ExpressVisaDiscoverEYSmarshKalshiBuildOps

The iPhone in your employee's pocket has more AI inference capability than a 2020-era data center GPU. Apple's Neural Engine processes AI tasks at 35 TOPS on the latest hardware. For enterprise iOS apps, this is not an abstract hardware specification — it is a capability that can eliminate cloud API dependencies, reduce latency to under one second, and make AI features available without network connectivity.

Key findings

Core ML on Apple A17 Pro achieves 35 TOPS — enabling 7B parameter LLM inference at 10-15 tokens per second on device, with no cloud API call.

Wednesday's Off Grid uses Core ML for image generation on iOS via the Neural Engine. A 512x512 image generates in 8-12 seconds on an iPhone 15 Pro.

On-device Core ML AI is the right choice when data cannot leave the device (compliance), the feature must work offline, or latency below 500ms is required.

Cloud AI remains the right choice when the model is too large for the device, the feature requires internet for other reasons, or the target device fleet is below the Neural Engine capability threshold.

What Core ML enables on enterprise iOS

Core ML is Apple's machine learning framework. Its job is to take an AI model — trained anywhere, using any major ML framework — and run it on an Apple device using the most efficient processor available. The framework automatically routes inference to the Neural Engine for models that benefit from it, to the GPU for models requiring high parallelism, and to the CPU as a fallback.

The developer experience is declarative: load a model, pass it input, receive output. Core ML handles the routing, the memory management, and the hardware optimization. An engineer does not need to write GPU shaders or Neural Engine microcode — they write Swift code that calls the Core ML API.

The model formats Core ML supports are broad. Models trained in TensorFlow, PyTorch, scikit-learn, and ONNX can be converted to Core ML format using the coremltools Python library. Apple provides pre-trained models via the Core ML Model Gallery covering common enterprise use cases: document classification, object detection, image classification, NLP, and face detection.

For enterprise iOS apps, Core ML enables AI features that would otherwise require a cloud API call, with all the cost, latency, and connectivity dependencies that implies. On-device inference is free (no per-call API cost), fast (Neural Engine inference is typically under 500ms for standard models), private (data does not leave the device), and offline-capable.

The Neural Engine: what devices and what performance

Apple's Neural Engine capability has grown significantly with each chip generation:

ChipDevice (first appearance)Neural Engine TOPS
A11iPhone 8 (2017)0.6 TOPS
A12iPhone XS (2018)5 TOPS
A14iPhone 12 (2020)11 TOPS
A15iPhone 13 (2021)15.8 TOPS
A16iPhone 14 Pro (2022)17 TOPS
A17 ProiPhone 15 Pro (2023)35 TOPS
M1 (iPad Pro)iPad Pro 11-inch (2021)11 TOPS
M4 (iPad Pro)iPad Pro (2024)38 TOPS

For enterprise iOS apps deployed to a known device fleet, the target Neural Engine performance can be calculated from the fleet's Apple device distribution. A company deploying an app to iPhones purchased in the last 3 years will have a mix of A15, A16, and A17 Pro devices. The minimum supported capability — what the app guarantees for all users — is the A15 at 15.8 TOPS.

15.8 TOPS is sufficient for: image classification, object detection, text classification, document analysis, speech recognition (on-device, using the Speech framework), and small language models (under 3 billion parameters). It is not sufficient for 7B parameter language model inference at interactive speeds.

For enterprise apps targeting newer deployments (iPhone 14 Pro and later), A16 and A17 Pro capability supports significantly larger models and faster inference speeds.

Enterprise use cases for on-device AI

Document classification and extraction. Enterprise apps frequently handle documents: invoices, contracts, forms, reports. On-device document classification — routing an uploaded document to the correct processing workflow without sending it to a cloud API — is achievable in Core ML with a fine-tuned classification model. Document data extraction (pulling field values from a structured form) uses Vision's text recognition plus a Core ML extraction model. These features require no cloud API call, work offline, and satisfy data residency requirements.

Real-time object detection for field service. Field service apps that assist technicians with equipment identification or defect detection can use Core ML object detection models running on the camera feed. A technician points the camera at a piece of equipment; the app identifies it and overlays maintenance instructions. This runs at 2-10 frames per second on A15 and later, fast enough to be responsive without motion blur.

Medical and clinical image analysis. Healthcare enterprise apps that need to analyze clinical images — wound photography, skin lesion assessment, medication label verification — can run Core ML models that classify or detect features in the image on-device. This keeps patient data on the device, satisfying HIPAA data residency requirements without a cloud API call.

NLP for internal knowledge search. Enterprise knowledge base apps can use on-device sentence similarity models to provide AI-powered search without sending search queries to a cloud API. A Core ML NLP model converts search queries and document content to vector embeddings on-device, and the search is a cosine similarity calculation in memory. This works offline and requires no per-query API cost.

Speech recognition. Apple's Speech framework uses Core ML models for on-device speech recognition in multiple languages. Enterprise apps that need voice input — field service voice logging, clinical note dictation, operational commands — can use on-device speech recognition that works without internet and with low latency.

Tell us about your enterprise iOS app's data processing requirements and we will identify which AI features can run on-device.

Get my recommendation

Integrating Core ML in an enterprise iOS app

Core ML integration follows a consistent pattern regardless of the model type.

Step 1: Model selection or training. Choose a pre-trained model from Apple's Core ML Model Gallery if the use case matches a common task. For specialized enterprise use cases, fine-tune a base model on your organization's data using TensorFlow or PyTorch, then convert to Core ML format using coremltools.

Step 2: Model conversion and optimization. The coremltools library converts models from TensorFlow, PyTorch, and ONNX to the .mlpackage or .mlmodel format. During conversion, apply quantization (8-bit integer quantization reduces model size by 75% with minimal accuracy loss) to reduce the model's memory footprint and improve inference speed on the Neural Engine.

Step 3: Model packaging. The Core ML model is bundled with the app or downloaded on first launch. Models included in the app bundle increase the app size but are available immediately. Downloaded models require an initial network download but allow model updates without an app store release.

Step 4: Inference implementation. Core ML inference is called from Swift using the model's auto-generated Swift interface. The inference call is synchronous on a background thread — never on the main thread. Results are passed back to the UI layer via completion handler or async/await.

Step 5: Error handling and fallback. Core ML inference can fail if the model encounters an unexpected input format or the device runs out of memory for very large models. Enterprise implementations require fallback handling: if on-device inference fails, the app falls back to a cloud API call or degrades the feature gracefully.

Model update strategy. Core ML models in the app bundle require an app update to change. Models downloaded at runtime can be updated without an app release. For enterprise apps where model accuracy needs to improve over time, a remote model update strategy — models stored in a CDN, downloaded and cached locally — is more agile than app-bundle-only models.

On-device vs cloud AI: the decision framework

The choice between on-device Core ML and cloud AI is a four-question framework.

Question 1: Can the data leave the device? If the enterprise has data residency requirements that prevent sending the data to a cloud API — patient images, financial documents, proprietary technical data — on-device inference is required. No cloud call is made, no data leaves the device.

Question 2: Does the feature need to work offline? Field service apps, clinical apps, and logistics apps operate in environments without reliable internet. If the AI feature must work offline, on-device inference is required.

Question 3: Does the model fit on the device? A model that requires 10GB of RAM cannot run on a mobile device. As of 2026, enterprise iOS apps can practically deploy models up to approximately 3-4GB on devices with 6GB+ RAM (iPhone 14 Pro and later). Larger models — GPT-4 class — require cloud APIs.

Question 4: Is latency under 500ms required? Cloud AI API round-trip time is 300ms-2000ms depending on the model and server load. On-device inference for standard classification models is typically 20-200ms. For features where the user expects instantaneous response — real-time camera overlay, voice command interpretation, predictive text — on-device inference is faster.

Decision factorOn-deviceCloud AI
Data cannot leave deviceRequiredNot viable
Offline capability neededRequiredNot viable
Model size under 4GBFeasibleAlso feasible
Sub-500ms latency requiredPreferredMay not achieve
Complex, large models neededLimitedRequired
Model updates without app releaseHarderEasier
Per-query costZeroAPI cost per call

On-device LLMs and language models

Language model inference on-device is now practical on high-end iPhones. On the A17 Pro (iPhone 15 Pro), 7 billion parameter models run at 10-15 tokens per second — fast enough for real-time text generation and summarization in enterprise workflows.

The practical models for enterprise use: Llama 3.2 3B runs at 30+ tokens per second on A16 and A17. Llama 3.2 7B runs at 10-15 tokens per second on A17 Pro. Smaller distilled models (1B-2B parameters) run at 50+ tokens per second on A15 and later, enabling responsive autocomplete and classification.

Enterprise use cases for on-device LLMs: document summarization (summarize a contract or report on-device, no data leaves the device), field service report generation (assist a technician in writing an accurate service report from voice notes), clinical note synthesis (generate a clinical summary from structured patient data), and internal knowledge retrieval with on-device semantic search.

The deployment challenge is model size. A 7B parameter model quantized to 4-bit is approximately 4GB. This is a significant download for users, and it consumes approximately 4GB of storage. For enterprise apps, a one-time download on WiFi is acceptable. For apps that need to provide AI features on day one without a large download, smaller models (1B-3B parameters) or cloud APIs are the right starting point.

How Wednesday approaches on-device AI

Wednesday has shipped on-device AI in iOS apps via Core ML. Wednesday's On-Device AI (Off Grid) product uses Core ML for on-device image generation, running Stable Diffusion models on the iPhone 15 Pro's Neural Engine to generate 512x512 images in 8-12 seconds without a cloud API call.

For enterprise iOS engagements with on-device AI requirements, Wednesday's approach is:

Identify the AI feature requirements. Map each to the on-device vs cloud framework. For on-device features, identify the target device fleet and determine the available Neural Engine capability.

Select or fine-tune the model. For standard classification and detection tasks, evaluate Apple's Core ML Model Gallery before custom training. For enterprise-specific classification (document types, product defects, specialized medical imaging), fine-tune a base model on enterprise data.

Optimize for on-device deployment. Apply 8-bit or 4-bit quantization. Evaluate the accuracy-size trade-off for the specific use case. Profile inference speed on the minimum target device.

Implement with fallback. On-device inference is the primary path. Cloud API is the fallback for cases where on-device inference is unavailable or the device does not support the model.

Tell us what AI features your enterprise iOS app needs. We will scope the on-device vs cloud approach and give you a specific implementation plan.

Book my 30-min call
4.8 on Clutch
4x faster with AI2x fewer crashes100% money back

Frequently asked questions

Not ready for a call yet? Browse on-device AI guides and enterprise iOS decision frameworks.

Read more decision guides

About the author

Praveen Kumar

Praveen Kumar

LinkedIn →

Technical Lead, Wednesday Solutions

Praveen leads mobile engineering at Wednesday Solutions, specializing in React Native architecture, performance, and enterprise-scale delivery.

Four weeks from this call, a Wednesday squad is shipping your mobile app. 30 minutes confirms the team shape and start date.

Get your start date
4.8 on Clutch
4x faster with AI2x fewer crashes100% money back

Shipped for enterprise and growth teams across US, Europe, and Asia

American Express
Visa
Discover
EY
Smarsh
Kalshi
BuildOps
Ninjavan
Kotak Securities
Rapido
PharmEasy
PayU
Simpl
Docon
Nymble
SpotAI
Zalora
Velotio
Capital Float
Buildd
Kunai
Kalsi
American Express
Visa
Discover
EY
Smarsh
Kalshi
BuildOps
Ninjavan
Kotak Securities
Rapido
PharmEasy
PayU
Simpl
Docon
Nymble
SpotAI
Zalora
Velotio
Capital Float
Buildd
Kunai
Kalsi
American Express
Visa
Discover
EY
Smarsh
Kalshi
BuildOps
Ninjavan
Kotak Securities
Rapido
PharmEasy
PayU
Simpl
Docon
Nymble
SpotAI
Zalora
Velotio
Capital Float
Buildd
Kunai
Kalsi