Case study
50,000 users, 99% crash-free: Wednesday built the most complete on-device AI app in the world
Off Grid is Wednesday's live proof that every major AI capability can run on a phone without touching a server. Language models, image generation, voice, document search. All on the device. Built in 12 weeks.
AI / On-device ML · The most complete on-device AI app in the world · Global — iOS, Android, macOS
The challenge
On-device AI is the right answer. It is also the hardest thing to build in mobile.
Every organization using cloud AI today is making the same trade: capability in exchange for data. Every query an employee sends to a cloud model leaves the device, reaches a server the organization does not control, and may be used to train models on data the organization never agreed to share. "We don't train on your data" is a policy statement in a terms-of-service document. Policy statements change on acquisition, on leadership change, on the next update with a 30-day notice. The only answer that holds is architectural: if the data never leaves the device, no policy change can expose it.
There is a second problem. Cloud inference bills on consumption. Every query is a line item. As AI becomes a daily tool across an organization, that line item scales with every employee who adopts it. On-device inference runs on hardware the organization already owns. The 10,000th query costs the same as the first.
Both become non-issues once the engineering is in place. The engineering is what stops most teams. Models weigh 2-7 gigabytes and load into memory shared with every other app on the device. The phone's dedicated AI chip works differently on Snapdragon hardware than on Apple Silicon, and each needs its own inference pipeline. Document search requires local indexing and retrieval with no cloud database. Image generation runs a separate model pipeline on the same device as the language model. Most mobile teams have never built any of this. The gap between installing an AI library and shipping on-device AI that holds stable at scale is months of specialist work.
Wednesday built Off Grid to close that gap and prove it is closable. 50,000 users. 99% crash-free. 78 releases in 12 weeks. The question of whether on-device AI can be built and kept stable at real scale has a documented, public answer.
“The only guarantee is architecture, not policy.”
The approach
Every hard problem in on-device ML, solved in one app.
Wednesday built Off Grid in React Native with native inference modules in Kotlin and Swift. The application layer is shared across iOS and Android. The inference engines are native. The hardware leaves no room for abstraction overhead.
Language model inference. Qwen 3, Llama 3.2, Gemma 3, and Phi-4 run at 15-30 tokens per second on current flagship hardware. Multi-gigabyte models load from storage into active memory. Wednesday's memory management layer keeps the process within OS limits across multi-hour sessions, without the OS killing the app, without degradation, without crashing. Model loading is the easy part. Keeping a loaded model stable across foreground, background, and low-memory conditions is the engineering most teams don't anticipate.
Image generation. Stable Diffusion runs on the phone's dedicated AI chip. On Snapdragon hardware: 5-10 seconds, using Qualcomm's acceleration layer. On Apple Silicon: equivalent via Core ML. Two different acceleration frameworks. Two different programming models. One unified experience. Most teams pick one platform. Wednesday built both.
Document search. Users upload documents: contracts, research, clinical reference material, internal knowledge bases. The app indexes them locally and retrieves by relevance at query time. No cloud service processes the text. Nothing leaves the device. The same pipeline has been independently validated in peer-reviewed research for clinical dermatology reference material. On-device document retrieval in a regulated healthcare context, with no network dependency.
Voice and tools. On-device speech recognition transcribes voice input. The language model calls built-in tools in automatic loops: web search, calculator, knowledge base retrieval. Devices on the same local network share inference capacity without cloud involvement. A larger model on a desktop serves a phone on the same WiFi, with no data leaving the local network.
78 releases, no broken ones. Every build clears a full automated test gate before it ships. The 99% crash-free rate held across all 78 releases because the test infrastructure caught inference failures before they reached users.
The results
50,000 users. 99% crash-free. No cloud. No data exposure.
50,000 users in 12 weeks across the US, UK, India, Germany, and Canada. Every device runs full AI inference locally. Every query is processed on the device. Nothing reaches a server Wednesday or anyone else controls. The answer to "what data left the device?" is architectural, not contractual.
The crash-free rate held at 99% across all 78 releases. Language model inference, image generation, document search, and speech processing run in parallel on consumer hardware at the same stability standard as a conventional mobile app. The engineering that makes on-device AI auditable is the same engineering that makes it stable at scale.
The on-device document search approach has been independently validated in peer-reviewed research: "Towards Empowering the Offline Clinician: A Method for Enhancing Dermatology Reference Material Utility through Mobile Edge AI-Based Retrieval-Augmented Generation." The same architecture running in Off Grid has been scrutinized in a regulated clinical context and holds.
1,700 engineers have starred the project on GitHub. Any technical team can read the architecture before a first conversation with Wednesday. That is not a sales advantage. It is a diligence shortcut.
Cloud inference bills on consumption. At 50,000 users running multi-turn sessions daily, the cost on a standard cloud AI API would be significant and compounding. Off Grid's inference cost is zero. It runs on hardware users already own. Wednesday built this. Wednesday can build the version your organization needs.
ROI
Cloud AI inference scales with usage. Every query is a line item that grows with headcount and adoption. On-device inference runs on hardware the organization already owns. The marginal cost of the 10,000th query is identical to the first. Off Grid proved the model holds at 50,000 users with zero server infrastructure cost.
Run the numbers
See what these results would look like for your team size and budget.
Next step
Evaluating on-device AI for your mobile product?
30 minutes with an engineer. Bring your current setup and your deadline. You leave with a squad shape and a written burn estimate.
