What is the Apple Neural Engine and why does it matter for enterprise AI?

The Apple Neural Engine is a dedicated AI inference processor included in Apple A-series chips since the iPhone 8 (A11). It is separate from the CPU and GPU, optimized specifically for matrix operations used in neural network inference. On the A17 Pro (iPhone 15 Pro), it achieves 35 TOPS (trillion operations per second). This enables AI inference speeds on-device that were only possible on server hardware 5 years ago. For enterprise apps, it means AI features that are fast enough to be responsive without a cloud API call.

What types of AI models can run on-device via Core ML?

Core ML supports image classification, object detection, image segmentation, text classification, sentence similarity, question answering, sound classification, and tabular data prediction. It can import models from TensorFlow, PyTorch, ONNX, and scikit-learn via the coremltools Python library. Language models up to approximately 7 billion parameters can run on-device on iPhone 15 Pro and later, enabling on-device text generation and summarization.

Does on-device AI processing require internet connectivity?

No. Core ML inference runs entirely on the device — the CPU, GPU, or Neural Engine depending on the model and chip generation. There is no network call during inference. The AI model is packaged in the app bundle (or downloaded once and stored locally). This makes on-device AI suitable for offline use cases, high-security environments where data cannot leave the device, and latency-sensitive applications where cloud round-trip time is unacceptable.

How do you choose between on-device and cloud AI for an enterprise iOS feature?

The decision framework has four questions: (1) Does the data need to leave the device? If no for privacy or compliance reasons, on-device is required. (2) Does the feature need to work offline? If yes, on-device is required. (3) Is the required model smaller than approximately 2-4GB on the target device? If yes, on-device is feasible. (4) Does the feature require inference speeds under 1 second? If yes and the model fits on-device, on-device is often faster than cloud. If the data can leave the device, internet is required, the model is large, or complexity exceeds current on-device capability, cloud AI is the right choice.

What is the minimum iPhone generation for Core ML Neural Engine inference?

Every iPhone from the iPhone 8 (2017, A11 chip) onwards has a Neural Engine. However, the Neural Engine capability varies significantly by chip generation. The A11 supports basic model architectures. A15 and A16 support larger models efficiently. A17 Pro (iPhone 15 Pro) supports 7B parameter language model inference. For enterprise deployments, the practical minimum depends on the model size and inference speed requirement. For image classification and object detection, A12+ (iPhone XS, 2018) is a reasonable minimum. For language model inference, A16+ (iPhone 14 Pro, 2022) is the practical minimum.

Writing

Core ML and On-Device AI for iOS Enterprise Apps: The Complete Guide for US Companies 2026

The iPhone 15 Pro Neural Engine runs at 35 TOPS. Core ML routes inference through it automatically. Here is what enterprise iOS apps can do with on-device AI today.

Praveen Kumar · Technical Lead, Wednesday Solutions

9 min read·Published Nov 2, 2025·Updated Nov 2, 2025

4xfaster with AI

2xfewer crashes

10xmore work, same cost

4.8on Clutch

Trusted by teams at

In this article

What Core ML enables on enterprise iOS
The Neural Engine: what devices and what performance
Enterprise use cases for on-device AI
Integrating Core ML in an enterprise iOS app
On-device vs cloud AI: the decision framework
On-device LLMs and language models
How Wednesday approaches on-device AI

The iPhone in your employee's pocket has more AI inference capability than a 2020-era data center GPU. Apple's Neural Engine processes AI tasks at 35 TOPS on the latest hardware. For enterprise iOS apps, this is not an abstract hardware specification — it is a capability that can eliminate cloud API dependencies, reduce latency to under one second, and make AI features available without network connectivity.

Key findings

Core ML on Apple A17 Pro achieves 35 TOPS — enabling 7B parameter LLM inference at 10-15 tokens per second on device, with no cloud API call.

Wednesday's Off Grid uses Core ML for image generation on iOS via the Neural Engine. A 512x512 image generates in 8-12 seconds on an iPhone 15 Pro.

On-device Core ML AI is the right choice when data cannot leave the device (compliance), the feature must work offline, or latency below 500ms is required.

Cloud AI remains the right choice when the model is too large for the device, the feature requires internet for other reasons, or the target device fleet is below the Neural Engine capability threshold.

What Core ML enables on enterprise iOS

Core ML is Apple's machine learning framework. Its job is to take an AI model — trained anywhere, using any major ML framework — and run it on an Apple device using the most efficient processor available. The framework automatically routes inference to the Neural Engine for models that benefit from it, to the GPU for models requiring high parallelism, and to the CPU as a fallback.

The developer experience is declarative: load a model, pass it input, receive output. Core ML handles the routing, the memory management, and the hardware optimization. An engineer does not need to write GPU shaders or Neural Engine microcode — they write Swift code that calls the Core ML API.

The model formats Core ML supports are broad. Models trained in TensorFlow, PyTorch, scikit-learn, and ONNX can be converted to Core ML format using the coremltools Python library. Apple provides pre-trained models via the Core ML Model Gallery covering common enterprise use cases: document classification, object detection, image classification, NLP, and face detection.

For enterprise iOS apps, Core ML enables AI features that would otherwise require a cloud API call, with all the cost, latency, and connectivity dependencies that implies. On-device inference is free (no per-call API cost), fast (Neural Engine inference is typically under 500ms for standard models), private (data does not leave the device), and offline-capable.

The Neural Engine: what devices and what performance

Apple's Neural Engine capability has grown significantly with each chip generation:

Chip	Device (first appearance)	Neural Engine TOPS
A11	iPhone 8 (2017)	0.6 TOPS
A12	iPhone XS (2018)	5 TOPS
A14	iPhone 12 (2020)	11 TOPS
A15	iPhone 13 (2021)	15.8 TOPS
A16	iPhone 14 Pro (2022)	17 TOPS
A17 Pro	iPhone 15 Pro (2023)	35 TOPS
M1 (iPad Pro)	iPad Pro 11-inch (2021)	11 TOPS
M4 (iPad Pro)	iPad Pro (2024)	38 TOPS

For enterprise iOS apps deployed to a known device fleet, the target Neural Engine performance can be calculated from the fleet's Apple device distribution. A company deploying an app to iPhones purchased in the last 3 years will have a mix of A15, A16, and A17 Pro devices. The minimum supported capability — what the app guarantees for all users — is the A15 at 15.8 TOPS.

15.8 TOPS is sufficient for: image classification, object detection, text classification, document analysis, speech recognition (on-device, using the Speech framework), and small language models (under 3 billion parameters). It is not sufficient for 7B parameter language model inference at interactive speeds.

For enterprise apps targeting newer deployments (iPhone 14 Pro and later), A16 and A17 Pro capability supports significantly larger models and faster inference speeds.

Enterprise use cases for on-device AI

Document classification and extraction. Enterprise apps frequently handle documents: invoices, contracts, forms, reports. On-device document classification — routing an uploaded document to the correct processing workflow without sending it to a cloud API — is achievable in Core ML with a fine-tuned classification model. Document data extraction (pulling field values from a structured form) uses Vision's text recognition plus a Core ML extraction model. These features require no cloud API call, work offline, and satisfy data residency requirements.

Real-time object detection for field service. Field service apps that assist technicians with equipment identification or defect detection can use Core ML object detection models running on the camera feed. A technician points the camera at a piece of equipment; the app identifies it and overlays maintenance instructions. This runs at 2-10 frames per second on A15 and later, fast enough to be responsive without motion blur.

Medical and clinical image analysis. Healthcare enterprise apps that need to analyze clinical images — wound photography, skin lesion assessment, medication label verification — can run Core ML models that classify or detect features in the image on-device. This keeps patient data on the device, satisfying HIPAA data residency requirements without a cloud API call.

NLP for internal knowledge search. Enterprise knowledge base apps can use on-device sentence similarity models to provide AI-powered search without sending search queries to a cloud API. A Core ML NLP model converts search queries and document content to vector embeddings on-device, and the search is a cosine similarity calculation in memory. This works offline and requires no per-query API cost.

Speech recognition. Apple's Speech framework uses Core ML models for on-device speech recognition in multiple languages. Enterprise apps that need voice input — field service voice logging, clinical note dictation, operational commands — can use on-device speech recognition that works without internet and with low latency.

Tell us about your enterprise iOS app's data processing requirements and we will identify which AI features can run on-device.

Get my recommendation →

Integrating Core ML in an enterprise iOS app

Core ML integration follows a consistent pattern regardless of the model type.

Step 1: Model selection or training. Choose a pre-trained model from Apple's Core ML Model Gallery if the use case matches a common task. For specialized enterprise use cases, fine-tune a base model on your organization's data using TensorFlow or PyTorch, then convert to Core ML format using coremltools.

Step 2: Model conversion and optimization. The coremltools library converts models from TensorFlow, PyTorch, and ONNX to the .mlpackage or .mlmodel format. During conversion, apply quantization (8-bit integer quantization reduces model size by 75% with minimal accuracy loss) to reduce the model's memory footprint and improve inference speed on the Neural Engine.

Step 3: Model packaging. The Core ML model is bundled with the app or downloaded on first launch. Models included in the app bundle increase the app size but are available immediately. Downloaded models require an initial network download but allow model updates without an app store release.

Step 4: Inference implementation. Core ML inference is called from Swift using the model's auto-generated Swift interface. The inference call is synchronous on a background thread — never on the main thread. Results are passed back to the UI layer via completion handler or async/await.

Step 5: Error handling and fallback. Core ML inference can fail if the model encounters an unexpected input format or the device runs out of memory for very large models. Enterprise implementations require fallback handling: if on-device inference fails, the app falls back to a cloud API call or degrades the feature gracefully.

Model update strategy. Core ML models in the app bundle require an app update to change. Models downloaded at runtime can be updated without an app release. For enterprise apps where model accuracy needs to improve over time, a remote model update strategy — models stored in a CDN, downloaded and cached locally — is more agile than app-bundle-only models.

On-device vs cloud AI: the decision framework

The choice between on-device Core ML and cloud AI is a four-question framework.

Question 1: Can the data leave the device? If the enterprise has data residency requirements that prevent sending the data to a cloud API — patient images, financial documents, proprietary technical data — on-device inference is required. No cloud call is made, no data leaves the device.

Question 2: Does the feature need to work offline? Field service apps, clinical apps, and logistics apps operate in environments without reliable internet. If the AI feature must work offline, on-device inference is required.

Question 3: Does the model fit on the device? A model that requires 10GB of RAM cannot run on a mobile device. As of 2026, enterprise iOS apps can practically deploy models up to approximately 3-4GB on devices with 6GB+ RAM (iPhone 14 Pro and later). Larger models — GPT-4 class — require cloud APIs.

Question 4: Is latency under 500ms required? Cloud AI API round-trip time is 300ms-2000ms depending on the model and server load. On-device inference for standard classification models is typically 20-200ms. For features where the user expects instantaneous response — real-time camera overlay, voice command interpretation, predictive text — on-device inference is faster.

Decision factor	On-device	Cloud AI
Data cannot leave device	Required	Not viable
Offline capability needed	Required	Not viable
Model size under 4GB	Feasible	Also feasible
Sub-500ms latency required	Preferred	May not achieve
Complex, large models needed	Limited	Required
Model updates without app release	Harder	Easier
Per-query cost	Zero	API cost per call

On-device LLMs and language models

Language model inference on-device is now practical on high-end iPhones. On the A17 Pro (iPhone 15 Pro), 7 billion parameter models run at 10-15 tokens per second — fast enough for real-time text generation and summarization in enterprise workflows.

The practical models for enterprise use: Llama 3.2 3B runs at 30+ tokens per second on A16 and A17. Llama 3.2 7B runs at 10-15 tokens per second on A17 Pro. Smaller distilled models (1B-2B parameters) run at 50+ tokens per second on A15 and later, enabling responsive autocomplete and classification.

Enterprise use cases for on-device LLMs: document summarization (summarize a contract or report on-device, no data leaves the device), field service report generation (assist a technician in writing an accurate service report from voice notes), clinical note synthesis (generate a clinical summary from structured patient data), and internal knowledge retrieval with on-device semantic search.

The deployment challenge is model size. A 7B parameter model quantized to 4-bit is approximately 4GB. This is a significant download for users, and it consumes approximately 4GB of storage. For enterprise apps, a one-time download on WiFi is acceptable. For apps that need to provide AI features on day one without a large download, smaller models (1B-3B parameters) or cloud APIs are the right starting point.

How Wednesday approaches on-device AI

Wednesday has shipped on-device AI in iOS apps via Core ML. Wednesday's On-Device AI (Off Grid) product uses Core ML for on-device image generation, running Stable Diffusion models on the iPhone 15 Pro's Neural Engine to generate 512x512 images in 8-12 seconds without a cloud API call.

For enterprise iOS engagements with on-device AI requirements, Wednesday's approach is:

Identify the AI feature requirements. Map each to the on-device vs cloud framework. For on-device features, identify the target device fleet and determine the available Neural Engine capability.

Select or fine-tune the model. For standard classification and detection tasks, evaluate Apple's Core ML Model Gallery before custom training. For enterprise-specific classification (document types, product defects, specialized medical imaging), fine-tune a base model on enterprise data.

Optimize for on-device deployment. Apply 8-bit or 4-bit quantization. Evaluate the accuracy-size trade-off for the specific use case. Profile inference speed on the minimum target device.

Implement with fallback. On-device inference is the primary path. Cloud API is the fallback for cases where on-device inference is unavailable or the device does not support the model.

Case study — Clinical digital health platform

0patient logs lost offline — seizures logged anywhere, synced automatically

“They really cared and felt like an extension of our team. The quality of the work was top notch, and they were receptive to shifting priorities.”

Founder, Digital health platformRead the case study →

Tell us what AI features your enterprise iOS app needs. We will scope the on-device vs cloud approach and give you a specific implementation plan.

Book my 30-min call →

4.8 on Clutch

4x faster with AI2x fewer crashes100% money back

Frequently asked questions

Not ready for a call yet? Browse on-device AI guides and enterprise iOS decision frameworks.