What types of AI models run on Core ML?

Core ML supports classification models (image classification, document classification, text classification), regression models, sequence models (including recurrent neural networks for time-series data), object detection models (bounding box detection in camera frames), natural language processing models (sentiment analysis, named entity recognition, language identification), and custom neural network models. Apple also provides several built-in models through the Vision and Natural Language frameworks that wrap Core ML for common tasks — developers do not need to supply a model to use Vision for face detection, text recognition, or object tracking.

Does Core ML work on iPhone SE and older devices?

Core ML runs on all iOS devices, but the Apple Neural Engine — the dedicated hardware that accelerates Core ML inference — was introduced in the A11 Bionic chip (iPhone 8 and X). Devices before A11 run Core ML inference on the CPU and GPU instead. For enterprise apps targeting device fleets that include older managed iPhones, this is a performance consideration rather than a capability gap: the model runs on all devices, but inference is slower on pre-A11 hardware. Most US enterprise device fleets in 2026 have been refreshed beyond iPhone 8, making this a minor concern.

How large are Core ML model files, and how does that affect app size?

Core ML model sizes range widely. Apple's built-in models accessed through the Vision or Natural Language framework have no app size impact — they are part of iOS. Custom Core ML models bundled in the app add to the binary size: a MobileNetV2 image classification model is around 14MB, a BERT-based NLP model is 400-500MB uncompressed but can be quantized to 50-100MB. The 50MB App Store over-the-air download limit is the practical constraint — models above this threshold should use on-device model download rather than app bundle inclusion. Wednesday routinely uses Core ML Model Deployment (CloudKit-backed model hosting) to deliver models after first install.

Can Core ML handle voice transcription for enterprise iOS apps?

Yes. Whisper models — OpenAI's open-source speech recognition architecture — have been converted to Core ML format and run fully on-device on Apple Silicon iOS devices. The smaller Whisper variants (Whisper Tiny and Whisper Base) run in real time on iPhone 12 and above. For enterprise use cases like on-device meeting transcription, field note dictation, or clinical documentation, on-device Whisper via Core ML processes speech without sending audio to a server. Wednesday has integrated Core ML Whisper into healthcare enterprise iOS apps where sending audio to cloud APIs was not permitted under the data processing agreement.

What does a Core ML model update look like after an app is shipped?

There are two update paths. The simpler path is bundling the new model in an app update — the user updates the app through the App Store and gets the new model. The more flexible path is Core ML Model Deployment using CloudKit, where the app checks for a new model version at launch and downloads it on-device without requiring an App Store update. The second approach is better for enterprise apps where model improvements happen frequently and waiting for App Store review cycles adds latency to improvement deployment. Wednesday recommends the CloudKit deployment approach for any enterprise app where the model is expected to improve regularly.

Is Core ML the right choice if we also need Android AI features?

Not exclusively. Core ML is iOS-only. For Android, the equivalent is TensorFlow Lite (Google's on-device inference framework). If your enterprise app needs the same AI features on both iOS and Android, you need two model formats and two inference implementations — Core ML for iOS and TensorFlow Lite for Android. The models themselves can be trained once and converted to both formats. Wednesday has built cross-platform on-device AI features with this architecture: a single training pipeline, Core ML export for iOS, TensorFlow Lite export for Android.

Writing

Core ML and On-Device AI for iOS Enterprise Apps: The Complete Guide for US Companies 2026

Core ML runs AI models on the Apple Neural Engine — no data leaves the device. For healthcare, financial services, and regulated enterprise iOS apps, this is not a preference. It is the only architecture that meets compliance.

Anurag Rathod · Technical Lead, Wednesday Solutions

9 min read·Published Nov 3, 2025·Updated Nov 3, 2025

4xfaster with AI

2xfewer crashes

10xmore work, same cost

4.8on Clutch

Trusted by teams at

In this article

What Core ML actually does
Four enterprise use cases for Core ML
HIPAA and on-device processing
Converting models to Core ML format
Deployment: app update vs on-device download
Performance: Neural Engine vs cloud API round-trip
What Core ML cannot do
Frequently asked questions

A cloud AI API call for a medical image takes 400-800ms round-trip and sends protected health information to a server your legal team will ask about during the next HIPAA audit. A Core ML inference call on the Apple Neural Engine takes 12-40ms and never leaves the device. For the enterprise iOS use cases below, that difference is the entire architecture decision.

Key findings

Core ML inference on the Apple Neural Engine runs 10-50x faster than equivalent cloud API round-trips for classification and detection tasks — with zero latency variance from network conditions.

On-device processing means no PHI, financial data, or proprietary content passes through a cloud AI provider's infrastructure — removing the HIPAA Business Associate Agreement requirement for AI inference.

Core ML supports four enterprise use cases out of the box: document classification, image analysis, voice transcription (Whisper), and real-time translation — all without a server component.

Model deployment via CloudKit allows Core ML model updates to reach users without an App Store update cycle, enabling continuous model improvement independent of feature releases.

What Core ML actually does

Core ML is Apple's framework for running trained AI models on iOS devices. The models run on the Apple Neural Engine — dedicated hardware that Apple has built into every iPhone since 2017. The Neural Engine handles matrix multiplication (the core operation of neural network inference) without touching the CPU or GPU, which means fast inference that does not drain the battery or compete with app rendering.

Core ML does not train models. You bring a trained model — from TensorFlow, PyTorch, scikit-learn, or another ML framework — convert it to Core ML format using Apple's coremltools library, and bundle it with the app. From that point, the model runs on-device for every inference call.

What this means for enterprise iOS development: AI features that process data produced in the app — a document the user photographed, a voice note the user recorded, a screen the user is viewing — can run entirely on the device. No network request. No cloud dependency. No data leaving the user's hands.

Four enterprise use cases for Core ML

Document classification. Core ML can classify documents by type (invoice, contract, ID card, medical record) or by content category. For enterprise field apps that process paperwork — an insurance adjuster photographing claim documents, a field service technician capturing job completion forms, a healthcare worker documenting patient intake — on-device document classification routes the document to the right workflow without sending it to a cloud classifier.

The model takes a camera frame or scanned image as input and returns a category label with a confidence score in under 50ms. The app uses the classification to pre-fill a form, trigger a workflow, or flag a document for review.

Image analysis for damage assessment and medical imaging. Object detection and image segmentation models run on-device to identify specific features in images. For insurance and property damage use cases, the model identifies damage regions in a photograph and estimates severity. For wound care and clinical documentation, the model identifies anatomical landmarks or wound characteristics in a camera frame.

The healthcare case here is worth stating plainly: sending a photograph of a patient wound to a cloud API creates a HIPAA-covered data flow. Running the same analysis on-device using Core ML creates no data flow to audit or consent for.

Voice transcription using Whisper. OpenAI's Whisper speech recognition models have been converted to Core ML format and are publicly available. The smaller Whisper variants (Whisper Tiny at 39M parameters, Whisper Base at 74M parameters) run in real time on iPhones with A12 chips and above.

Enterprise use cases: field technician note dictation that creates structured work orders, clinical documentation from provider voice notes, and accessibility features for users who cannot type. The audio never leaves the device. A cloud transcription API (AWS Transcribe, Azure Speech) processes the same audio on a server — a data flow your privacy policy must disclose and your IT team must secure.

Real-time translation. The Natural Language framework includes on-device language detection and basic translation for common languages. For enterprise apps serving multilingual workforces — a construction management app used by crews who speak Spanish, Portuguese, or Mandarin — on-device translation renders form labels and status messages in the user's language without a network request.

HIPAA and on-device processing

HIPAA's primary concern for AI in enterprise iOS apps is where Protected Health Information goes. PHI includes any health-related data that could identify a specific individual — a patient name combined with a diagnosis, a photograph of a wound with a case number, an audio recording of a clinical visit.

When an app sends PHI to a cloud AI API, the API provider becomes a Business Associate under HIPAA — a third party that must sign a Business Associate Agreement (BAA) and accept HIPAA liability. Getting a BAA from a cloud AI provider adds legal overhead, creates a vendor dependency, and requires audit trails for every data transfer.

When an app processes PHI using Core ML on-device, no PHI is transmitted. There is no Business Associate to contract with for the AI inference step. The data never leaves the device, which means the most complex part of a HIPAA technical safeguard — securing data in transit to a third party — does not apply.

For healthcare enterprise iOS apps, on-device Core ML inference is not a performance optimization. It is the architecture that removes a class of HIPAA compliance obligation from the product.

Wednesday built the clinical digital health platform — a seizure tracking app for patients in underserved clinical settings — on this architecture. Patient logs are processed on-device and synced encrypted when connectivity is available. Zero patient records have been lost, and zero PHI has been transmitted to a third-party AI processor.

Case study — Clinical digital health platform

0patient logs lost offline — seizures logged anywhere, synced automatically

“They really cared and felt like an extension of our team. The quality of the work was top notch, and they were receptive to shifting priorities.”

Founder, Digital health platformRead the case study →

Converting models to Core ML format

Core ML uses the .mlmodel and .mlpackage file formats. Models trained in any major ML framework can be converted using Apple's coremltools Python library.

The conversion path by source framework:

Source framework	Conversion approach	Notes
PyTorch	`coremltools.convert()` via TorchScript export	Most common path for 2026 models
TensorFlow / Keras	`coremltools.convert()` from SavedModel or H5	Requires TF 2.x; TF 1.x requires additional steps
scikit-learn	`coremltools.converters.sklearn`	Supports classification, regression, feature engineering
ONNX	`onnx-coreml` converter	Intermediate path for models from other frameworks
Create ML	Native export	Apple's training tool exports Core ML directly

Model quantization reduces file size by converting floating-point weights to lower-precision formats. An unquantized PyTorch model converted to Core ML retains the original 32-bit float weights. Applying 8-bit quantization (supported in Core ML) reduces file size by 75% with minimal accuracy loss for classification tasks. For large models that would otherwise exceed the 50MB over-the-air download limit, quantization is required.

Deployment: app update vs on-device download

Once converted, a Core ML model is deployed to users through one of two paths.

Bundled in the app binary. The model file is included in the Xcode project and ships with the app. Users receive the model when they install or update the app. This is the simplest approach and is appropriate for models that change infrequently — a document classifier that is trained once and stable, or a translation model that improves only with major iOS releases.

The constraint: App Store review is required for every model update, which adds 24-72 hours to deployment time. For enterprise apps where model accuracy improvements are part of an ongoing product roadmap, this creates friction.

Downloaded on-device after installation. Core ML Model Deployment using CloudKit stores model files in iCloud and the app checks for updates at launch. When a new model version is available, the app downloads it in the background and uses the new model from the next inference call. No App Store review is required for model updates through this path.

The constraint: the app must handle the case where the model has not yet downloaded — showing a fallback UI, queuing inference requests, or using the bundled baseline model until the downloaded version is available. This adds implementation complexity but is the right architecture for enterprise apps where model quality is an ongoing improvement target.

Performance: Neural Engine vs cloud API round-trip

The performance difference between on-device Core ML inference and a cloud API round-trip is significant enough to change the product design.

A cloud API call for image classification involves: capturing the image, encoding it for transmission, sending the request, server queuing, inference on the server, returning the result, and parsing the response. In good network conditions, this takes 400-800ms. On a cellular connection with variable signal — common in field service, healthcare, and logistics settings — it takes 1-3 seconds. In poor connectivity, it times out.

Core ML inference on the Apple Neural Engine involves: passing the input buffer to the Neural Engine and reading the output. For a MobileNetV2-class model, this takes 12-40ms. For a Whisper Tiny transcription of a 10-second audio clip, it takes 200-400ms. The timing is deterministic — it does not vary with network conditions because there is no network.

Inference type	Cloud API round-trip	Core ML on-device	Offline capable
Image classification	400-800ms (good network)	12-40ms	Yes
Document classification	300-600ms	15-50ms	Yes
Object detection	500-900ms	20-60ms	Yes
Voice transcription (10s clip)	800-1500ms	200-400ms (Whisper Tiny)	Yes
Text classification	200-400ms	5-15ms	Yes

For enterprise field apps where users work in warehouses, clinical settings, construction sites, or rural locations, the offline capability is as important as the speed. A cloud API that fails in poor connectivity creates a workflow interruption. Core ML runs on the device, not on the network.

Your board wants AI features in the iOS app. The question is whether they run on-device or in the cloud. Wednesday's engineers will tell you which architecture fits your compliance requirements.

Get my recommendation →

What Core ML cannot do

On-device Core ML is the right choice for a specific class of enterprise AI features. It is not a replacement for cloud AI in every scenario.

Large language models. GPT-class models (7B parameters and above) do not fit on iOS devices in their full-precision form. Apple Intelligence, announced in 2024, runs a 3B-parameter on-device model for basic language tasks, but enterprise apps cannot access the Apple Intelligence models directly — they access them through the API surface Apple exposes. For enterprise iOS apps that need full LLM capability (document generation, complex reasoning, multi-turn conversation), cloud AI APIs remain the right architecture. The HIPAA and privacy mitigation is prompt design that avoids including PHI, not on-device inference.

Real-time model updates from production data. Core ML models are static after deployment. A fraud detection model that should update daily based on new transaction patterns cannot run as a Core ML model that learns continuously. This use case requires a cloud model with a fast update cycle. The on-device alternative is a feature extraction model that sends abstracted features (not raw data) to a cloud classifier — keeping PHI or proprietary data on-device while using cloud compute for the classification.

Models requiring enterprise-scale compute. Some inference tasks — video analysis at full resolution, multi-modal reasoning across large documents, batch processing of thousands of records — require more compute than a mobile device can provide. These use cases require cloud infrastructure regardless of the privacy architecture.

Your iOS app needs on-device AI and your compliance team has questions. Book a 30-minute call with a Wednesday engineer who has built Core ML features under HIPAA.

Book my 30-min call →

4.8 on Clutch

4x faster with AI2x fewer crashes100% money back

Frequently asked questions

Adding on-device AI to an enterprise iOS app? The writing archive has privacy-preserving AI guides, iOS security checklists, and native vs cross-platform decision guides.