How do I score a vendor who claims AI capability but cannot show evidence?

Score them zero on any dimension where they cannot produce evidence. A vendor who claims AI workflow maturity but cannot show a workflow output scores zero on that dimension, not partial credit. The scorecard is designed to surface the gap between what vendors claim and what they can demonstrate. Partial credit for unverified claims defeats the purpose of the assessment.

Can a vendor score low on one dimension and still be the right choice?

Yes, if the failing dimension is not critical to your specific AI project. A vendor who scores low on on-device capability is still viable if your AI feature is cloud-connected. A vendor who scores low on compliance readiness is still viable if your app is not regulated. Apply weights based on your project requirements before reading the total score.

Should I use this scorecard for my current vendor or only for new vendors?

Both. The scorecard produces useful information regardless of whether the vendor is incumbent or new. An incumbent vendor who scores well on dimensions one through four but poorly on on-device capability has told you something specific about what support they need. A new vendor who scores well across all six has demonstrated readiness. The scorecard is a tool, not a verdict.

What score threshold should I require before signing an AI development contract?

A score of 14 or above out of 18 (passing four of six dimensions at full marks) is a reasonable threshold for proceeding without a proof-of-concept phase. A score of 10 to 13 warrants a six-week proof-of-concept before committing the full budget. A score below 10 indicates that the vendor is not ready for the AI mandate and an alternative should be evaluated. These thresholds assume equal weighting - adjust them if some dimensions are more critical to your specific project.

Writing

Mobile AI Mandate Scorecard: Assessing Vendor Readiness

A structured scorecard for evaluating whether a mobile vendor can deliver on an AI mandate. Six dimensions, each with a pass threshold and a decision rule.

Rameez Khan · Head of Delivery, Wednesday Solutions

6 min read·Published Mar 29, 2026·Updated Apr 26, 2026

4xfaster with AI

2xfewer crashes

10xmore work, same cost

4.8on Clutch

Trusted by teams at

In this article

How to use this scorecard
Dimension one: shipped AI track record
Dimension two: AI workflow maturity
Dimension three: on-device capability
Dimension four: compliance readiness
Dimension five: team composition
Dimension six: cost modeling
Reading the score
Frequently asked questions

Vendor AI claims are easy to make and hard to verify without a structured framework. This scorecard covers six dimensions of AI delivery readiness - each with a specific evidence requirement and a three-point scale. Run it in a 90-minute vendor assessment session and you will have a defensible score before you commit budget.

The scorecard does not replace the proof-of-concept phase for high-stakes projects. It tells you whether the proof-of-concept is worth running, and - if the vendor scores poorly - which specific gaps need to be addressed before the project starts.

How to use this scorecard

Score each dimension 0, 1, or 3. Zero means no evidence provided. One means partial evidence or unverified claim. Three means specific, verifiable evidence produced in the assessment. Maximum score is 18. Use the reading guide at the end to interpret the result.

Run the assessment in a vendor meeting. Do not send the scorecard in advance - you want to see how the vendor responds to questions they did not prepare for, not a polished set of pre-prepared answers.

How to use this scorecard

Assign one of three scores to each dimension after the vendor assessment conversation.

0 points: The vendor could not answer the question or produced no evidence.

1 point: The vendor described a capability but could not produce an artifact, reference, or specific example to verify it.

3 points: The vendor produced specific, verifiable evidence - an artifact, a named reference, a demonstrable workflow output.

There is no 2-point score. Partial credit for unverified claims produces scores that overestimate vendor capability. The gap between a claim and a demonstrable example is the most important gap the scorecard measures.

Dimension one: shipped AI track record

The question: Name a production AI feature you have delivered in a mobile app. Not a prototype. A feature that is live today, with real users.

What a 3 looks like: The vendor names a specific feature in a specific app, describes the model it uses, and can tell you the latency in production and one metric that shows it is performing. You can verify it by downloading the app.

What a 1 looks like: The vendor describes an AI project they worked on but it is internal tooling, a prototype, or a pilot that was not shipped to production users.

What a 0 looks like: The vendor cannot name a shipped AI feature in a live app.

Dimension two: AI workflow maturity

The question: Show me an AI workflow output from a recent project.

What a 3 looks like: The vendor produces one of: an AI-generated code review comment from a recent release, a sample of AI-generated release notes from a shipped version, or a screenshot regression comparison showing AI-detected visual differences between builds.

What a 1 looks like: The vendor describes the tools they use - Copilot, a code review tool, an AI testing platform - but cannot produce an output artifact in the conversation.

What a 0 looks like: The vendor describes AI in general terms without naming specific tools or showing any evidence of AI-integrated workflows.

Dimension three: on-device capability

The question: Have you built a feature that runs AI inference on-device, without requiring an internet connection?

What a 3 looks like: The vendor names a specific on-device AI feature, describes the model format used (Core ML, TensorFlow Lite, ONNX, or similar), the device constraints they optimized for, and the latency and accuracy in production.

What a 1 looks like: The vendor has explored on-device AI but has not shipped a production on-device feature. They can describe the technical approach but have no production reference.

What a 0 looks like: The vendor has not built on-device AI features and cannot describe a concrete approach.

Note: if your AI mandate is for cloud-connected features only, this dimension carries less weight. Adjust accordingly.

Case study — Fashion e-commerce platform

99%crash-free sessions maintained across every release at 20 million users

“We're most impressed with Wednesday Solutions' flexibility and willingness to orient and train their developers before they join our teams.”

Associate Engineering Director, Fashion e-commerce platformRead the case study →

If you want a structured vendor assessment for an AI mandate and would like a second opinion on the results, a 30-minute call covers the scorecard and what the scores mean for your specific project.

Book my call →

Dimension four: compliance readiness

The question: Have you shipped an AI feature in a regulated environment - financial services, healthcare, or edtech with minors' data?

What a 3 looks like: The vendor describes a specific regulated environment project, how they handled the privacy review for the AI data flows, what the App Store review looked like, and whether they worked with a CISO or compliance officer on the feature approval.

What a 1 looks like: The vendor has built in regulated environments but not with AI features. They understand compliance requirements but have not navigated the AI-specific additions.

What a 0 looks like: The vendor has not built in regulated environments and cannot describe the compliance requirements relevant to your industry.

Note: if your app is not in a regulated industry, this dimension carries less weight.

Dimension five: team composition

The question: Who specifically will work on the AI features, and what have they built?

What a 3 looks like: The vendor names specific engineers with AI integration experience, describes what each person has built, and confirms that the team includes at least one person with on-device or cloud AI integration experience in a production app.

What a 1 looks like: The vendor describes general team expertise without naming specific people or specific projects. Or they plan to hire AI expertise for this project rather than drawing on an existing team member.

What a 0 looks like: The vendor cannot name specific team members for the AI work and has no clear plan for who will own the AI integration.

Dimension six: cost modeling

The question: How would you model the ongoing cost of this AI feature at scale?

What a 3 looks like: The vendor describes the inference cost structure for the feature type - per-query costs for cloud features, update cycle costs for on-device features - and produces a rough model showing cost at current usage and projected usage in 12 months.

What a 1 looks like: The vendor acknowledges that inference costs exist but cannot model them for your specific use case or usage projections.

What a 0 looks like: The vendor has not considered ongoing inference costs or treats the development cost as the total cost of the feature.

Reading the score

15 to 18: The vendor has demonstrated readiness across all six dimensions. Proceed with contract negotiation. A proof-of-concept is optional, not required.

10 to 14: The vendor has demonstrated readiness in most dimensions. Identify which dimensions scored below 3 and determine whether they are critical to your specific AI project. For high-stakes mandates, run a six-week proof-of-concept before committing the full budget.

6 to 9: The vendor has significant gaps in AI delivery readiness. Proceeding without addressing the gaps is high-risk. Options: bring in a specialist to fill the specific gaps, run a structured proof-of-concept with a tight decision gate, or evaluate alternative vendors.

Below 6: The vendor is not ready for the AI mandate. Evaluate alternative vendors before committing.

Wednesday has delivered AI features for enterprise mobile teams across financial services, healthcare, edtech, and logistics. A 30-minute call covers how we score on this framework and what the engagement looks like.

Book my call →

Frequently asked questions

The writing archive has vendor comparison guides, cost benchmarks, and decision frameworks for every stage of the enterprise mobile buying process.

Mobile AI Mandate Scorecard: Assessing Vendor Readiness

How to use this scorecard

Dimension one: shipped AI track record

Dimension two: AI workflow maturity

Dimension three: on-device capability

Dimension four: compliance readiness

Dimension five: team composition

Dimension six: cost modeling

Reading the score

Frequently asked questions

How to Know If Your Current Vendor Can Deliver Your Board AI Mandate

Why the Vendor Who Built Your App Cannot Always Add AI to It

How to Verify a Mobile Vendor Is Actually AI-Native Before You Commit