Writing

On-Device AI Cost at Hardware Shipment Scale: What 100,000-Unit Margins Hinge On for US Enterprise 2026

On-device AI is the cheaper option at 100,000 units only if four cost decisions go right. Here is the margin model a CFO needs before signing off on a hardware launch.

Mohammed Ali ChherawallaMohammed Ali Chherawalla · Co-founder & CRO, Wednesday Solutions
13 min read·Published May 15, 2026·Updated May 15, 2026
4xfaster with AI
2xfewer crashes
10xmore work, same cost
4.8on Clutch

Trusted by teams at

American Express
Visa
Discover
EY
Smarsh
Kalshi
BuildOps
Kunai
American Express
Visa
Discover
EY
Smarsh
Kalshi
BuildOps
Kunai
American Express
Visa
Discover
EY
Smarsh
Kalshi
BuildOps
Kunai

On-device AI is the cheaper option at 100,000 units only if the four cost decisions go right. Get any of them wrong, and the margin per device disappears before the product ships. Most CFOs sign off on the AI feature without understanding which four decisions own the math, and the engineering team finds out at month nine that the hardware tier they picked cannot run the model the product team scoped. The fix is not more engineering. The fix is making the four cost decisions in the right order, before the bill of materials is finalised, before the vendor contract is signed, and before the first prototype ships. Here is the model a CFO needs to run before approving any on-device AI launch at hardware shipment scale.

Four decisions decide on-device AI margin at 100,000 units

Four decisions determine whether on-device AI margin holds up at 100,000-unit shipment scale: model size, hardware tier, inference frequency per user, and update strategy. Locking any of them down without modelling the other three produces a feature that ships at a loss.

On-device AI adds 4 dollars to 80 dollars per unit to the bill of materials, depending on the hardware tier. A 100,000-unit shipment carries a 400,000 dollar to 8 million dollar hardware premium that must be earned back in avoided cloud cost over the device lifetime.

The break-even window for a 100,000-unit shipment is 6 to 24 months for query volumes above 200 inferences per user per month. Below that frequency, hybrid or pure cloud architectures usually win the comparison.

Why on-device AI margins live or die at hardware shipment scale

The economics of on-device AI are not the economics of a SaaS app with cloud inference. A SaaS team pays per-token cloud cost only when a user runs a query. A hardware team pays the on-device premium on every shipped unit, whether the user runs a query or not. That difference inverts the cost structure. SaaS economics reward low-frequency users. Hardware economics reward high-frequency users because the per-unit hardware premium gets amortised across more inference events.

At 100,000-unit scale, the on-device premium is 400,000 dollars at the low end and 8 million dollars at the high end. The product team is committed to that cost the moment the bill of materials is locked. The CFO cannot decide three months after launch that the AI feature is too expensive and quietly turn it off, because the hardware is already in the field. The decision is one-way.

That one-way commitment is why the four cost decisions have to be made in order, against a unit-economics model that the CFO can defend. Mac built Off Grid, an on-device AI app with chat, image generation, voice, and vision running locally across iOS, Android, and macOS. Off Grid has shipped to more than 50,000 users with zero paid spend on cloud inference. The numbers in this article are the numbers from that experience and from Wednesday engagements that ship hardware-tier AI features to US enterprise clients.

The mistake most teams make is treating on-device AI as a feature decision instead of a margin decision. The feature decision is "should we add AI." The margin decision is "what AI, on what hardware, at what frequency, with what update path." Get the feature decision right and the margin decision wrong, and the project ships at a loss for 36 months.

The four decisions that drive on-device AI cost per device

Four decisions determine per-device AI cost. A vendor who cannot run unit economics across all four has not done this work at hardware shipment scale.

Decision 1: model size. A 1B-parameter quantised model runs on a standard ARM chip with 4GB of RAM. A 7B-parameter model needs a neural-accelerator chip and 8GB. A 13B-parameter model or a multi-modal model needs the high-end neural processor and 12GB or more. The model size decision sets the hardware tier. Most teams pick the model first based on accuracy targets, then discover the hardware tier they need is two tiers above what the product budget allows.

Decision 2: hardware tier. The chip class, the memory budget, and the storage allocation are locked when the bill of materials is finalised. The premium ranges from 4 dollars per unit for the standard ARM tier to 80 dollars per unit for the high-end neural-processor tier. At 100,000 units, that range is 400,000 dollars at the low end and 8 million dollars at the high end. The hardware tier decision is the single largest line item in the on-device AI margin model.

Decision 3: inference frequency per user. The number of queries a typical user runs per month determines how quickly the avoided cloud cost pays back the hardware premium. Below 100 queries per user per month, on-device usually loses to cloud or hybrid at 100,000 units. Above 500 queries per user per month, on-device wins decisively. The frequency assumption has to be defended with usage data from the actual product, not from a market-average benchmark that may apply to a different feature.

Decision 4: update strategy. Models age. A model that ships today is two generations behind in 18 months. The update strategy decides whether the product team rotates the device fleet (a no-go for most enterprise hardware shipments), updates models over the air on the existing hardware (requires the hardware tier to be sized for two model generations ahead), or accepts the model freeze. Each path has a different cost profile and a different competitive risk. A vendor who pitches on-device AI without naming the update strategy is selling a 12-month feature, not a 36-month product.

These four decisions feed the unit-economics model. They are also tightly coupled. Picking a 7B-parameter model forces the neural-accelerator chip tier. Picking the standard ARM tier forces the 1B-parameter model. The CFO who wants a defensible margin model insists on seeing all four decisions on one page before approving the project. The on-device AI vs cloud AI inference cost guide for enterprise teams covers the per-user math; this article covers the per-device shipment math sitting one layer up.

What 100,000-unit shipments actually cost across hardware tiers

The hardware-tier premium at 100,000 units is the largest single number on the project. Here is the range Wednesday has priced across recent engagements and hardware vendor quotes.

Hardware tierModel classBOM premium per unit100,000-unit premium
Standard ARM (4-8GB RAM)1B-parameter quantised4 dollars to 12 dollars400,000 dollars to 1.2 million dollars
Mid-tier with neural accelerator (8-12GB)7B-parameter25 dollars to 60 dollars2.5 million dollars to 6 million dollars
High-end neural processor (12GB+)13B-parameter or multi-modal60 dollars to 80 dollars6 million dollars to 8 million dollars

The standard ARM tier is the only one that produces sub-1 million dollar premium at 100,000 units. A team that can ship the product with a 1B-parameter model has a structurally different margin model than a team that needs a 7B-parameter model. Reframing the AI feature to fit the smaller model is the single most powerful cost lever in the project. It is also where most teams give up too quickly because the smaller model accuracy looks lower in evaluation.

Beyond the hardware premium, three other cost lines apply at 100,000-unit scale. Engineering cost to integrate, optimise, and ship the model runs 250,000 dollars to 800,000 dollars depending on the feature complexity. Model licensing or fine-tuning cost runs 80,000 dollars to 400,000 dollars depending on whether the team uses an open-source base model or a commercial one. QA and certification (for regulated products) adds 60,000 dollars to 250,000 dollars.

Total project cost for an on-device AI feature on a 100,000-unit shipment ranges from 790,000 dollars at the bottom (standard ARM, open-source 1B model) to 9.45 million dollars at the top (high-end neural processor, commercial 13B model, regulated certification). The margin math compares that total against the cloud cost the on-device feature avoids over the 36-month device lifetime.

When on-device AI loses to cloud at 100,000 units

Cloud AI wins the cost comparison at 100,000 units in three specific situations. A CFO who knows these situations avoids the mistake of mandating on-device when the math points the other way.

Situation 1: low query frequency. Below 100 inferences per user per month, the avoided cloud cost across 36 months does not cover the on-device hardware premium at 100,000 units. A 1-dollar-per-month cloud cost saved across 100,000 users over 36 months is 3.6 million dollars in avoided cost. That covers the standard ARM tier premium but does not cover the mid-tier or high-end tiers. The on-device math wins only when query frequency is high enough to amortise the hardware premium.

Situation 2: long-context or multi-modal inference. Some AI features need 13B-parameter models, long context windows (32K tokens or more), or multi-modal fusion (text plus image plus audio). The hardware tier required to run those features on-device pushes the per-unit BOM premium into the 60-80 dollar range, which produces a 6 million to 8 million dollar shipment premium. Cloud cost for the same feature is 0.04 cents to 0.12 cents per query. Unless query frequency is above 800 per user per month, cloud wins at 100,000 units.

Situation 3: rapid model evolution required. If the AI feature improves quickly with each new model generation and the product strategy depends on shipping the latest capability within 90 days of a model release, on-device loses. Hardware lifecycle does not move at model-release speed. Hybrid architectures that run baseline inference on-device and route the latest-capability requests to the cloud are the right answer when model evolution is the competitive lever.

Outside those three situations, on-device AI is usually the cheaper option at 100,000 units, provided the four cost decisions are made in the right order. The AI feature cost per user at scale comparison for 100K and 500K DAU apps walks through the per-user inflection points that drive the per-shipment math.

How to scope on-device AI before signing a hardware shipment contract

The scoping conversation a CFO should run before approving the project follows four steps. Each step is a specific deliverable a vendor should produce.

Step 1: feature accuracy target. Define what accuracy or quality threshold the feature has to hit in production. "The model has to summarise a 2,000-word inspection report into a 200-word case file with 90 percent factual accuracy" is a target. "The model has to be good" is not. Without this target, model size cannot be locked and the hardware tier cannot be priced.

Step 2: query frequency forecast. Forecast how often a typical user runs the feature in a month. The forecast has to be defended with data from a comparable product or from a usability test on a prototype. A vendor who accepts a frequency forecast without challenge is one who will not push back when the assumption falls apart in production.

Step 3: device lifetime and update path. Decide whether the device is a 24-month, 36-month, or 60-month product, and decide whether the model updates over the air during that lifetime. Each decision changes the margin model. A 60-month device with no model updates needs to ship with a model that will still be competitive in year five.

Step 4: hardware tier and BOM premium. With the first three steps locked, the hardware tier is determined by the model the feature needs. The vendor produces a BOM premium per unit and a shipment-level premium at the target unit count. Both numbers go into the unit-economics model alongside the avoided cloud cost over the device lifetime.

A vendor who can deliver these four steps inside the first two weeks of scoping has shipped on-device AI before. A vendor who pushes the four-step model to "we will figure it out in design" is improvising at the company's expense.

Send Wednesday the AI feature on your roadmap, your target unit count, and your hardware tier ceiling. You leave with a one-page margin scoping note showing all four cost decisions.

Get my margin scoping note

What CFOs should demand from vendors pitching on-device AI at scale

Most vendor pitches for on-device AI skip the unit-economics math. The deck describes the feature, names the model, and shows accuracy benchmarks. The CFO is then asked to approve a hardware premium without seeing the four cost decisions on one page. That is the wrong way to start the project.

Demand 1: a one-page unit-economics model. The model shows the BOM premium per unit, the engineering cost amortised across the shipment, the avoided cloud cost over the 36-month device lifetime, and the model-update cost over the same window. The model has to break even inside 18 months for a 100,000-unit shipment. If it does not, the project does not deserve sign-off.

Demand 2: a named hardware tier with sourcing risk. The vendor names the chip class, the memory budget, and the storage allocation. The vendor identifies the supply chain risk if that chip becomes scarce. A vendor who pitches on-device AI without a backup hardware tier is exposing the company to a hardware sourcing risk that the supply chain team will surface six months later.

Demand 3: a defensible model size and accuracy benchmark. The vendor shows the model running the actual feature at the actual accuracy target on the proposed hardware tier. Not a similar feature. Not a similar model. The feature, the model, the hardware. Benchmarks on a different model or different hardware do not transfer to the production shipment.

Demand 4: an update strategy that survives the device lifetime. The vendor describes how the model will be kept current over 36 months. Over-the-air model updates need a sized hardware tier two model generations ahead. Model freezes need a competitive argument for why the year-three feature still wins. A vendor who has not picked one of those paths is selling a 12-month product into a 36-month device.

These four demands turn the conversation from "should we add AI" into "what is the unit-economics model that supports a 100,000-unit launch." A vendor who can answer that question is the one to short-list. The evaluation guide for mobile vendors on on-device AI capability covers the vendor-side scorecard for the rest of the engagement.

How Wednesday scopes on-device AI shipments

Wednesday scopes on-device AI shipments by starting from the unit-economics model and working backward into the engineering plan. The first 30-minute call covers the feature, the target unit count, the hardware tier the company can ship, and the inference frequency forecast. Wednesday returns inside two business days with a one-page margin scoping note: BOM premium, engineering cost, avoided cloud cost, model update strategy, and break-even window.

The Off Grid build is the proof point. The product had to run on-device chat, image generation, voice, and vision across iOS, Android, and macOS, with no cloud dependency. The team selected the model class, the hardware tier, and the update strategy at scoping time. Off Grid shipped to 50,000 users with zero paid spend on cloud inference. The unit economics model that Wednesday wrote at scoping survived the launch without revision.

The same pattern applies to enterprise hardware shipments. Wednesday writes the four-decision model in scoping. The engineering team executes against it during the build. The CFO signs off on a margin number that does not change after the contract is signed. A CTO who wants a defensible margin for an on-device AI launch at 100,000-unit scale should expect that level of unit-economics discipline from the vendor.

Send Wednesday your target unit count, the AI feature on the roadmap, and the hardware tier ceiling. You leave the call with a one-page margin scoping note and an itemised build range.

Book my 30-min call
4x faster with AI2x fewer crashes100% money back

Frequently asked questions

Is on-device AI cheaper than cloud AI when shipping 100,000 devices?

On-device AI is cheaper than cloud at 100,000 units in most cases, but only if four cost decisions go right. Model size has to fit the cheapest hardware tier the product can ship. Inference frequency has to be high enough that the avoided cloud cost exceeds the engineering and hardware premium. The update strategy has to keep models current without rotating the device fleet. And the AI feature has to actually need to run when offline or the cloud math wins. Skipping any of these turns the margin upside down inside the first year, and the hardware is already in the field by the time the CFO finds out.

How much does on-device AI add to the bill of materials on a 100,000-unit shipment?

On-device AI adds 4 dollars to 80 dollars per unit to the bill of materials at 100,000-unit scale, depending on the hardware tier. A standard ARM chip running a 1B-parameter model adds 4 to 12 dollars per unit. A neural-accelerator chip running a 7B-parameter model adds 25 to 60 dollars per unit. A high-end neural processor for 13B-parameter models or multi-modal inference adds 60 to 80 dollars per unit. Hardware tier is the single largest cost decision in the project, and reframing the feature to fit a smaller model is the most powerful cost lever a product team has.

When does cloud AI win the cost comparison at 100,000-unit shipment scale?

Cloud AI wins at 100,000 units when inference frequency is below roughly 100 queries per user per month and the AI feature does not need to run offline. Below that frequency, the avoided cloud cost is too small to cover the on-device hardware premium across 100,000 units. Hybrid models that run small inference locally and route long-context requests to the cloud are usually the right call in that range. Pure on-device wins only when offline operation is non-negotiable or query volume is high enough to amortise the hardware premium across the device lifetime.

What should a CFO demand from a mobile vendor pitching on-device AI at hardware shipment scale?

Demand a unit-economics model with the four cost decisions named and priced. The model has to show the bill-of-materials premium per device, the engineering cost amortised across the shipment count, the inference cost avoided over a 36-month device lifetime, and the model-update cost over the same window. A vendor who pitches on-device AI without that model has not shipped this before. The model should reach break-even inside 18 months for a 100,000-unit shipment, or the project does not deserve sign-off.

Frequently asked questions

Send Wednesday your unit count target, the AI feature on the roadmap, and the hardware tier you can ship. You leave with a one-page margin scoping note.

Get my margin scoping note

About the author

Mohammed Ali Chherawalla

Mohammed Ali Chherawalla

LinkedIn →

Co-founder & CRO, Wednesday Solutions

Mac co-founded Wednesday Solutions and has shipped mobile apps used by more than 10 million people, written APIs that take over a billion calls a day, and architected systems that have driven hundreds of millions in revenue across fintech and logistics. He is one of the leading practitioners of on-device AI for enterprise mobile and the creator of Off Grid, one of the top on-device AI applications in the world. He now leads commercial strategy at Wednesday while staying close to architecture, AI enablement, and vendor evaluation for enterprise clients.

30 minutes with an engineer. You leave with a squad shape, a monthly cost, and a start date.

Get your start date
4x faster with AI2x fewer crashes100% money back

Shipped for enterprise and growth teams across US, Europe, and Asia

American Express
Visa
Discover
EY
Smarsh
Kalshi
BuildOps
Kunai
Allen Digital
Ninjavan
Kotak Securities
Rapido
PharmEasy
PayU
Simpl
Docon
Nymble
SpotAI
Zalora
Velotio
Capital Float
Buildd
Kalsi
American Express
Visa
Discover
EY
Smarsh
Kalshi
BuildOps
Kunai
Allen Digital
Ninjavan
Kotak Securities
Rapido
PharmEasy
PayU
Simpl
Docon
Nymble
SpotAI
Zalora
Velotio
Capital Float
Buildd
Kalsi
American Express
Visa
Discover
EY
Smarsh
Kalshi
BuildOps
Kunai
Allen Digital
Ninjavan
Kotak Securities
Rapido
PharmEasy
PayU
Simpl
Docon
Nymble
SpotAI
Zalora
Velotio
Capital Float
Buildd
Kalsi