Writing
Mobile AI Mandate Scorecard: Assessing Vendor Readiness
A structured scorecard for evaluating whether a mobile vendor can deliver on an AI mandate. Six dimensions, each with a pass threshold and a decision rule.
In this article
Vendor AI claims are easy to make and hard to verify without a structured framework. This scorecard covers six dimensions of AI delivery readiness - each with a specific evidence requirement and a three-point scale. Run it in a 90-minute vendor assessment session and you will have a defensible score before you commit budget.
The scorecard does not replace the proof-of-concept phase for high-stakes projects. It tells you whether the proof-of-concept is worth running, and - if the vendor scores poorly - which specific gaps need to be addressed before the project starts.
How to use this scorecard
Score each dimension 0, 1, or 3. Zero means no evidence provided. One means partial evidence or unverified claim. Three means specific, verifiable evidence produced in the assessment. Maximum score is 18. Use the reading guide at the end to interpret the result.
Run the assessment in a vendor meeting. Do not send the scorecard in advance - you want to see how the vendor responds to questions they did not prepare for, not a polished set of pre-prepared answers.
How to use this scorecard
Assign one of three scores to each dimension after the vendor assessment conversation.
0 points: The vendor could not answer the question or produced no evidence.
1 point: The vendor described a capability but could not produce an artifact, reference, or specific example to verify it.
3 points: The vendor produced specific, verifiable evidence - an artifact, a named reference, a demonstrable workflow output.
There is no 2-point score. Partial credit for unverified claims produces scores that overestimate vendor capability. The gap between a claim and a demonstrable example is the most important gap the scorecard measures.
Dimension one: shipped AI track record
The question: Name a production AI feature you have delivered in a mobile app. Not a prototype. A feature that is live today, with real users.
What a 3 looks like: The vendor names a specific feature in a specific app, describes the model it uses, and can tell you the latency in production and one metric that shows it is performing. You can verify it by downloading the app.
What a 1 looks like: The vendor describes an AI project they worked on but it is internal tooling, a prototype, or a pilot that was not shipped to production users.
What a 0 looks like: The vendor cannot name a shipped AI feature in a live app.
Dimension two: AI workflow maturity
The question: Show me an AI workflow output from a recent project.
What a 3 looks like: The vendor produces one of: an AI-generated code review comment from a recent release, a sample of AI-generated release notes from a shipped version, or a screenshot regression comparison showing AI-detected visual differences between builds.
What a 1 looks like: The vendor describes the tools they use - Copilot, a code review tool, an AI testing platform - but cannot produce an output artifact in the conversation.
What a 0 looks like: The vendor describes AI in general terms without naming specific tools or showing any evidence of AI-integrated workflows.
Dimension three: on-device capability
The question: Have you built a feature that runs AI inference on-device, without requiring an internet connection?
What a 3 looks like: The vendor names a specific on-device AI feature, describes the model format used (Core ML, TensorFlow Lite, ONNX, or similar), the device constraints they optimized for, and the latency and accuracy in production.
What a 1 looks like: The vendor has explored on-device AI but has not shipped a production on-device feature. They can describe the technical approach but have no production reference.
What a 0 looks like: The vendor has not built on-device AI features and cannot describe a concrete approach.
Note: if your AI mandate is for cloud-connected features only, this dimension carries less weight. Adjust accordingly.
If you want a structured vendor assessment for an AI mandate and would like a second opinion on the results, a 30-minute call covers the scorecard and what the scores mean for your specific project.
Book my call →Dimension four: compliance readiness
The question: Have you shipped an AI feature in a regulated environment - financial services, healthcare, or edtech with minors' data?
What a 3 looks like: The vendor describes a specific regulated environment project, how they handled the privacy review for the AI data flows, what the App Store review looked like, and whether they worked with a CISO or compliance officer on the feature approval.
What a 1 looks like: The vendor has built in regulated environments but not with AI features. They understand compliance requirements but have not navigated the AI-specific additions.
What a 0 looks like: The vendor has not built in regulated environments and cannot describe the compliance requirements relevant to your industry.
Note: if your app is not in a regulated industry, this dimension carries less weight.
Dimension five: team composition
The question: Who specifically will work on the AI features, and what have they built?
What a 3 looks like: The vendor names specific engineers with AI integration experience, describes what each person has built, and confirms that the team includes at least one person with on-device or cloud AI integration experience in a production app.
What a 1 looks like: The vendor describes general team expertise without naming specific people or specific projects. Or they plan to hire AI expertise for this project rather than drawing on an existing team member.
What a 0 looks like: The vendor cannot name specific team members for the AI work and has no clear plan for who will own the AI integration.
Dimension six: cost modeling
The question: How would you model the ongoing cost of this AI feature at scale?
What a 3 looks like: The vendor describes the inference cost structure for the feature type - per-query costs for cloud features, update cycle costs for on-device features - and produces a rough model showing cost at current usage and projected usage in 12 months.
What a 1 looks like: The vendor acknowledges that inference costs exist but cannot model them for your specific use case or usage projections.
What a 0 looks like: The vendor has not considered ongoing inference costs or treats the development cost as the total cost of the feature.
Reading the score
15 to 18: The vendor has demonstrated readiness across all six dimensions. Proceed with contract negotiation. A proof-of-concept is optional, not required.
10 to 14: The vendor has demonstrated readiness in most dimensions. Identify which dimensions scored below 3 and determine whether they are critical to your specific AI project. For high-stakes mandates, run a six-week proof-of-concept before committing the full budget.
6 to 9: The vendor has significant gaps in AI delivery readiness. Proceeding without addressing the gaps is high-risk. Options: bring in a specialist to fill the specific gaps, run a structured proof-of-concept with a tight decision gate, or evaluate alternative vendors.
Below 6: The vendor is not ready for the AI mandate. Evaluate alternative vendors before committing.
Wednesday has delivered AI features for enterprise mobile teams across financial services, healthcare, edtech, and logistics. A 30-minute call covers how we score on this framework and what the engagement looks like.
Book my call →Frequently asked questions
The writing archive has vendor comparison guides, cost benchmarks, and decision frameworks for every stage of the enterprise mobile buying process.
Read more decision guides →About the author
Rameez Khan
LinkedIn →Head of Delivery, Wednesday Solutions
Rameez has shipped mobile products at scale across on-demand logistics, entertainment, and edtech, and has led enterprise AI enablement across multiple Wednesday engagements. As Head of Delivery at Wednesday Solutions, he oversees how every engagement is scoped, staffed, and run from first build to production.
Four weeks from this call, a Wednesday squad is shipping your mobile app. 30 minutes confirms the team shape and start date.
Get your start date →Shipped for enterprise and growth teams across US, Europe, and Asia