How long should vendor evaluation take?

A structured evaluation using these eight questions takes two sessions: a 30-minute qualifying call to determine which vendors can produce the required artifacts, and a 60-minute technical review with the vendors that pass the first filter. The total evaluation time for three vendors is roughly 5 to 6 hours of internal time spread across two weeks. Evaluations that take longer are usually spending time on presentation decks rather than on artifact review.

Should I ask for client references?

Yes, but weight artifacts more heavily than references. A reference call tells you about the vendor-client relationship. Artifacts tell you about the vendor's actual process. A vendor with strong artifacts and strong references is the combination you want. A vendor with strong references but no artifacts may have delivered well in the past under different team or process conditions. Ask for both, and use the artifact review to evaluate the process rather than relying entirely on reference sentiment.

Is it acceptable to ask for velocity data from a current engagement?

Yes. Any vendor with mature delivery transparency will have this data ready and will share it with appropriate context. The data does not need to identify the client. Aggregate velocity trends, feature completion rates, and bug rate data can be shared without disclosing client identity. A vendor who refuses to share any velocity data at all is a vendor who does not measure it.

What if the vendor is strong on some questions but weak on others?

Weight questions 1 through 3 most heavily. Velocity data, UI regression handling, and code review artifacts tell you whether the vendor has built AI-augmented delivery infrastructure. Questions 4 through 8 are important but can be negotiated into the engagement structure. A vendor who cannot produce velocity data or a code review log is missing foundational infrastructure that is hard to add mid-engagement.

How do I evaluate vendors who are new and do not have three past engagements to reference?

Ask for the internal tools and process setup instead of historical data. A vendor without three past enterprise engagements can still demonstrate a working screenshot regression system, a configured AI review integration, and a documentation template. The infrastructure should exist before the engagement starts. If it does not, you are funding the vendor's capability build, not benefiting from it.

What is the minimum acceptable answer on the "what do I own" question?

At minimum: the app, all source code, all third-party licenses you are responsible for, and the test suite. A strong answer adds architecture documentation, an onboarding guide, and any AI-generated artifacts (release notes, review logs) produced during the engagement. The test suite and documentation are the elements that most commonly get left out and cost the most to reconstruct after the engagement ends.

Writing

What to Ask a Mobile Vendor About Their AI Workflow Before You Sign: 2026 Guide for US Enterprise

Eight questions that separate vendors with genuine AI-augmented delivery from vendors with AI-augmented marketing. Includes the exact follow-up probes and what the answers reveal about infrastructure, team continuity, and delivery risk.

Anurag Rathod · Technical Lead, Wednesday Solutions

9 min read·Published Mar 7, 2026·Updated Mar 7, 2026

4xfaster with AI

2xfewer crashes

10xmore work, same cost

4.8on Clutch

Trusted by teams at

In this article

How to use this guide
Q1: Can you show me velocity data?
Q2: How do you handle UI regressions?
Q3: What does your code review produce?
Q4: How quickly can you start?
Q5: What do I own at the end?
Q6: What is your release cadence?
Q7: How do you generate documentation?
Q8: What if a key engineer leaves?
Vendor scorecard
Frequently asked questions

Eight questions decide whether a mobile vendor can deliver what they claim. Not eight weeks of evaluation. Eight questions, asked in order, each with a specific follow-up probe. The answers either produce an artifact you can evaluate or they do not. The difference between a vendor with genuine AI-augmented delivery and one with AI-augmented marketing becomes clear within 90 minutes.

The questions are designed around a simple principle: a vendor who has built the infrastructure can show you its output. A vendor who has not built it will describe what they intend to build. Both answers look similar in a proposal deck. They look very different when you ask for a sample log, a recent report, or a data chart from the last eight weeks.

Key findings

Eight questions with specific follow-up probes separate vendors with genuine AI delivery infrastructure from vendors with marketing claims. The difference becomes visible in 90 minutes.

The most informative questions are Q1 (velocity data), Q2 (regression handling), and Q3 (code review artifacts). These three reveal whether the foundational infrastructure exists.

Questions about onboarding time, ownership, and engineer departure scenarios reveal delivery risk that most buyers do not ask about until after a problem occurs.

A vendor with genuine AI-augmented delivery can produce every artifact requested in this guide within 24 hours of the ask. If they need longer than that, the process is not embedded in their daily workflow.

How to use this guide

Run these questions in two sessions. In the first session (a 30-minute call), ask Q1 through Q3 and request the corresponding artifacts. Use the artifact responses to decide which vendors proceed to the second session. In the second session (60 minutes), cover Q4 through Q8 with the vendors who produced strong artifacts in the first round.

The artifact request is the filter. "We do that" is not a passing response to any of these questions. "Here is the output from the last three engagements" is.

Q1: Can you show me velocity data from your last three engagements?

The question: "Can you share velocity data from your last three enterprise engagements? Features shipped per week, trend over time, and how that compares to your initial plan."

Weak answer: "We move fast." Or: "We deliver on time." Or: "We use agile methodology." These describe an attitude or a process name, not a measurement.

Strong answer: A chart or table showing features completed per week over the engagement duration, with the planned rate as a comparison line. Ideally, it shows at least one period where delivery slowed and explains what happened and how it was resolved. A vendor who shows only good weeks is either omitting data or has only worked on easy engagements.

Follow-up probe: "What was the lowest-velocity week in one of these engagements, and what caused it?"

What it reveals: Whether the vendor measures delivery at all. AI-augmented teams that are actually 30 to 40% faster than traditional teams have the data to show it. Vendors who do not measure velocity cannot demonstrate acceleration, and they cannot give you early warning when an engagement is slowing down.

Q2: How do you handle UI regressions?

The question: "Walk me through what happens when a change breaks a screen layout on a device you were not testing on."

Weak answer: "Our QA team catches regressions." Or: "We do thorough manual testing." Manual QA is better than nothing, but a team with manual-only regression testing is checking a fraction of the device-size combinations that enterprise users run on.

Strong answer: "We run automated screenshot regression against a defined device matrix after every change. Diffs are flagged before the change is approved. The engineer who made the change sees the flag and resolves it before it reaches QA. Here is a sample regression report from a recent release."

Follow-up probe: "What is your device matrix? Can you show me a recent regression report?"

What it reveals: Whether the vendor has invested in automated regression infrastructure. Building and maintaining a screenshot regression system for a full device matrix requires real effort. Vendors who have done it can describe their device list and produce a report. Vendors who have not will pivot to describing their QA process.

Q3: What does your code review process produce?

The question: "For each change that ships, what artifacts does your code review process generate?"

Weak answer: "Our senior engineers review every change before it merges." This describes staffing. It says nothing about what the review produces.

Strong answer: "Every change produces a review log. The log includes the static analysis output, the AI-generated findings with severity classifications, the test coverage delta, and the human engineer's resolution decisions. We can export that log for any period and it maps directly to SOC 2 and HIPAA audit requirements. Here is a sample log."

Follow-up probe: "Can you show me a sample code review log from a recent change? I want to see what it looks like, not just hear about it."

What it reveals: Whether the vendor has AI-augmented review infrastructure or manual review with AI mentioned in the proposal. A review process that cannot produce a per-change log is not audit-ready. For any app touching regulated data (financial records, health information, user PII), the absence of a review log is a compliance gap.

Not ready for a call yet? Browse vendor evaluation guides and decision frameworks for every stage of the buying process.

Q4: How quickly can you start?

The question: "How long from contract signature to working software in our hands?"

Weak answer: "After contract is signed, we will scope the engagement and begin." This describes a kickoff process, not a delivery timeline.

Strong answer: "First working software in four weeks. The first two weeks cover orientation and architecture alignment. Working software ships by the end of week four." A vendor who has done this repeatedly has a process for it. The four-week onboarding is a test of whether the vendor has systematized their start-up process or whether every engagement starts from scratch.

Follow-up probe: "What happens in weeks one through four specifically? What does the week-four deliverable look like?"

What it reveals: Whether the vendor has a repeatable onboarding process or ad-hoc startup. Enterprise buyers frequently lose four to eight weeks to slow onboarding. A vendor who has systematized the first four weeks, including orientation, architecture alignment, and first working software delivery, has demonstrated that they have run this process before and know how to compress the startup period.

Q5: What do I own at the end of the engagement?

The question: "When the engagement ends, what is on the list of things I own?"

Weak answer: "The app." Some vendors add "all the source code" as if that is a complete answer.

Strong answer: "You own the app, the full source, the test suite with documentation on what each test covers and how to run it, architecture decision records from every significant decision made during the engagement, an onboarding guide that lets a new engineer become productive in under two weeks without a knowledge transfer session, and the AI-generated release notes and review logs from every release cycle. Everything we produce on your engagement is yours."

Follow-up probe: "Can you show me an architecture decision record and an onboarding guide from a previous engagement as examples?"

What it reveals: How the vendor thinks about your long-term risk. A vendor who hands over the app and the source has given you the minimum. A vendor who hands over documentation that lets your internal team or a future vendor pick up the work without a six-week reconstruction project has given you continuity. The test suite and architecture documentation are the two elements most commonly missing at engagement end and the two most expensive to reconstruct.

Q6: What is your release cadence for similar enterprise engagements?

The question: "For an engagement similar in scope to ours, how often does the app ship to the App Store or Google Play?"

Weak answer: "We ship regularly." Or: "We ship when features are ready." These describe an attitude, not a cadence.

Strong answer: "For engagements at this scope, we ship weekly. We have App Store submission data from our last six months of engagements showing the release dates. Some weeks are maintenance releases, some are feature releases, but the App Store submission happens every week. Here is the data."

Follow-up probe: "Can you share App Store submission dates from a similar engagement over the last three months?"

What it reveals: Whether the vendor has a working release pipeline that runs without heroics. An enterprise app that ships weekly has a mature release process: automated builds, automated testing, release notes generated, and submission prepared without a manual all-hands effort. Vendors who ship irregularly are typically shipping when a build finally passes all manual checks, not because they have a smooth pipeline.

Q7: How do you generate documentation?

The question: "How does your team produce documentation during an engagement, and when is it written?"

Weak answer: "Engineers write documentation at the end of each milestone." Documentation written after the fact is always incomplete. Engineers do not remember all the decisions made three months ago, and the decisions that get omitted are usually the ones that explain a non-obvious architectural choice.

Strong answer: "Architecture decisions are documented at the time they are made, in a short decision record that captures the decision, the alternatives considered, and the reason for the choice. Release notes are AI-generated per release cycle, with a human review pass for accuracy. The onboarding guide is maintained throughout the engagement and updated whenever the setup process changes. Nothing is written in a documentation sprint at the end."

Follow-up probe: "Can you walk me through the last architecture decision you documented and what the record looks like?"

What it reveals: Whether documentation is embedded in the delivery workflow or treated as a post-engagement task. Documentation written in real time is accurate. Documentation written at the end of an engagement is a best approximation. The practical difference: an onboarding guide written throughout the engagement lets a new engineer become productive in two weeks. An onboarding guide written at the end takes two weeks to produce and is still incomplete.

Q8: What happens if a key engineer leaves mid-engagement?

The question: "If your most experienced engineer on our engagement leaves in month six, what is the continuity plan?"

Weak answer: "We will find a replacement." This describes a hiring intention, not a continuity plan.

Strong answer: "The onboarding guide and architecture documentation exist from day one. A new engineer can read the documentation and become productive in under two weeks without a knowledge transfer session from the departing engineer. We have tested this on past engagements where team composition changed mid-engagement. The documentation is the continuity plan, not the individual engineer."

Follow-up probe: "How long did it take a new engineer to reach full productivity in the most recent team composition change you experienced?"

What it reveals: Whether the vendor has systematized continuity or relies on individual knowledge. A team where all the context lives in a single engineer's head is a delivery risk. A team with architecture documentation, an onboarding guide, and AI-generated release notes that capture what shipped and when is a team that can absorb personnel changes without losing delivery momentum.

Vendor scorecard

Use the table below to score vendors in evaluation. Score each question 0 (weak answer, no artifact), 1 (strong answer, artifact takes more than 24 hours to produce), or 2 (strong answer, artifact produced within 24 hours). A vendor scoring 14 or above has mature AI-augmented delivery infrastructure.

Question	Weak answer signals	Strong answer signals	What it reveals
Q1: Velocity data	"We move fast", process description	Chart with trend data and plan comparison from 3 engagements	Whether delivery is measured at all
Q2: UI regression handling	"We do thorough manual QA"	Screenshot regression report with device matrix and diff resolution	Whether automated regression infrastructure exists
Q3: Code review artifacts	"Senior engineers review everything"	Per-change log with AI findings, severity classifications, resolutions	Whether review is audit-ready
Q4: Start timeline	"After contract we will scope"	First working software in 4 weeks with week-by-week plan	Whether onboarding is systematized
Q5: Ownership at end	"The app and source code"	App + test suite + ADRs + onboarding guide + release logs	How vendor thinks about buyer's long-term risk
Q6: Release cadence	"We ship regularly"	Weekly App Store submission data from last 3 months	Whether the release pipeline runs without heroics
Q7: Documentation generation	"Engineers write docs at end of milestones"	ADRs written in real time, AI-generated release notes per cycle	Whether documentation is embedded or aspirational
Q8: Engineer departure plan	"We will find a replacement"	Documented onboarding guide, new engineer productive in under 2 weeks	Whether continuity relies on individuals or systems

Case study — Field service SaaS platform

3platforms shipped from one team — web, iOS, and Android

“Their desire to exceed expectations rather than just follow orders sets them apart. They go out of their way to improve the engineering, not just ship the feature.”

Director of Engineering, Field service platformRead the case study →

The eight questions take 90 minutes to run through with a vendor who is prepared. A vendor who can answer all eight with artifacts within 24 hours has built systems, not just teams. The difference between a system and a team is what happens when a key engineer leaves, when a release cycle gets compressed, or when a compliance auditor asks for evidence of your review process.

Wednesday ships to the App Store weekly. Every change has a review log. Every release has AI-generated notes. Every engagement has an onboarding guide that exists before the second engineer joins the team. These are not aspirational standards. They are the default operating mode. If you want to verify any of them before signing, we will show you the artifacts in the first call.

Bring these eight questions to a 30-minute call with a Wednesday engineer. We will answer every one with the artifact.

Book my 30-min call →

4.8 on Clutch

4x faster with AI2x fewer crashes100% money back

Frequently asked questions