What should I look for when evaluating a mobile development vendor for enterprise work?

Six things: a verifiable track record in your industry or a comparable one, automated testing as a standard practice (not an add-on), a clear program management structure so your VP Engineering is not doing the vendor's work, examples of what they have been trusted with after an initial engagement, references from VP Engineering or CTO level at prior clients, and a structured pilot that lets you evaluate output before committing to a long engagement.

How do I know if a mobile vendor can handle core infrastructure?

Ask what they built for prior clients that every other system in that client's product depends on. A vendor trusted with core infrastructure will have a specific answer with a specific example. A vendor that has only built feature rings around existing systems will not. The clearest signal is what a prior client handed them after the initial trust was established — peripheral features or foundational services.

What is the right way to pilot a mobile development vendor before a long engagement?

A structured four-week pilot with a real but bounded scope: one feature or one clearly defined subsystem, with clear success criteria defined before work begins. The pilot should include the vendor's full workflow — requirements translation, testing, code review, and delivery. Evaluate the output against three questions: did it ship on time, did it require rework, and how much did your engineering lead have to be involved? Those three answers tell you what a 12-month engagement will look like.

What automated testing should a mobile vendor have in place?

At minimum: unit tests covering business logic, integration tests covering API interactions, and end-to-end tests covering the critical user paths. In addition, automated screenshot regression testing to catch visual changes across devices, and a CI gate that runs the full test suite before any build goes to staging. If the vendor does not have a CI gate, defects reach users regularly. That is not a quality problem — it is a process problem.

Writing

What to Demand From a Mobile Vendor Before Giving Them Core Infrastructure

Most enterprises hand core mobile infrastructure to a vendor based on the quality of their pitch deck. Here is the evidence a vendor should provide before they touch anything that cannot break.

Ali Hafizji · CEO & Co-founder, Wednesday Solutions

8 min read·Published May 4, 2026·Updated May 4, 2026

4xfaster with AI

2xfewer crashes

10xmore work, same cost

4.8on Clutch

Trusted by teams at

In this article

What core infrastructure means
Five questions before handing trust
Demonstrate, not claim
The automated testing requirement
How to run a structured pilot
The vendor evaluation scorecard
Frequently asked questions

The NPS system that India's largest exam prep platform handed to Wednesday is one of only two internal services that every other service in their entire product ecosystem depends on. Five million students. $400M in revenue. Every internal system routing through it. That kind of trust does not come from a vendor pitch. It comes from a vendor that spent the first months of an engagement proving — through delivered work, held deadlines, and zero escalations — that they could be trusted with something that cannot break.

Most enterprises invert this sequence. They evaluate the pitch deck, approve the vendor, and hand over the core work first. When the vendor fails, the failure is expensive, visible, and hard to recover.

The right sequence is different: establish the evidence, run a bounded pilot, evaluate the output, then decide what to trust them with.

What trust in a vendor is actually built on

A pitch deck is not evidence. A case study is evidence. A live reference call with a VP Engineering at a prior client is evidence. A pilot that ran on time with no rework is evidence.

Every other input to the vendor evaluation process is a proxy for the real question: what happens when they are alone with your code and no one is watching?

What core infrastructure means in mobile

Not every part of a mobile product carries the same risk. A new onboarding screen can be rolled back if it fails. The authentication service cannot. The payment processing integration cannot. The data sync layer that a field worker's offline data depends on cannot.

Core infrastructure in a mobile product is anything where a failure does not produce a bad user experience — it produces an outage. Data that does not sync is not a UX problem. It is a business continuity problem. A failed authentication service does not frustrate users. It locks them out.

Before handing any scope to a mobile vendor, classify the work:

Peripheral features: New screens, UI changes, content updates. Low consequence if they fail. Revertable.
Integrated features: Features that connect to existing systems — payment flows, notification infrastructure, analytics. Medium consequence. Need testing against production-like environments.
Core infrastructure: Authentication, data sync, API gateways, services that other services depend on. High consequence. Failure is visible, wide, and slow to fix.

A vendor's first engagement should start with peripheral or integrated features. Core infrastructure comes after they have proven they can handle the first two categories without your engineering lead carrying the work.

The five questions before handing over trust

1. What is the most critical thing you have built for a prior client — and what happened when it failed?

A vendor that has only built peripheral features will give you a generic answer. A vendor that has built core infrastructure will give you a specific one — and they will have a specific answer about what their incident response looked like, because everything that can break eventually does. The quality of the incident response answer tells you as much as the original delivery story.

2. What does your QA process look like before a release goes to production?

The correct answer involves automated testing — unit, integration, end-to-end — plus a CI gate that blocks deployment until the test suite passes. Any answer that involves only manual QA, or "our engineers review their own work," is a signal that defects will reach users regularly.

3. What does your engineering lead do when there is a blocker they cannot resolve at their level?

A self-managing vendor has a clear internal escalation path. A vendor that relies on your VP Engineering to resolve blockers does not. Ask for a specific recent example of a technical decision that required escalation — who escalated it, to whom, and how it was resolved. If the answer involves your engineering lead or someone at the client organization, the vendor's team is not senior enough for unsupervised core infrastructure work.

4. Show me the last post-release incident report you wrote for a client.

A vendor that writes post-release incident reports has a quality culture. A vendor that does not has a blame culture. The incident report should describe what happened, what caused it, what was done to fix it, and what process change prevents it from happening again. If the vendor has never written one, either nothing has ever broken (unlikely) or nothing has ever been documented (much more likely).

5. What did a prior client trust you with after the initial engagement — and why?

The clearest signal of a vendor's actual quality is not what they were hired to do initially. It is what they were given after they proved themselves. A vendor asked to build peripheral features in month one and handed core infrastructure in month four has a documented track record of earning trust. A vendor that has only ever built what they were first hired for does not.

What a vendor should demonstrate, not claim

The evaluation process fails when it treats vendor claims as evidence. "We have extensive experience in enterprise mobile" is not evidence. "Here is the VP Engineering from our last enterprise engagement — call them" is evidence.

Three things to demand as demonstrated output, not claimed capability:

A live codebase walkthrough. Not slides. Not screenshots. A senior engineer from the vendor walking through code they wrote for a prior client — explaining the architecture decisions, the trade-offs, and the things they would do differently. This takes 45 minutes. It tells you more about the team's engineering depth than any RFP response.

A test suite for a prior project. Ask to see the test coverage report from a prior engagement. Not 100% coverage — that is not the point. The point is whether they have unit tests covering business logic, integration tests covering API dependencies, and a CI pipeline that runs the suite on every commit. If they cannot show you this, they do not have it.

References at the engineering level, not the account level. A reference from a business development contact at a prior client tells you nothing about engineering quality. A reference from the VP Engineering or CTO who worked alongside the vendor's team daily tells you everything. Ask for two references at that level. If they can only provide one, or if the references are filtered through the vendor's account team, treat that as a signal.

Case study — Field service SaaS platform

3platforms shipped from one team — web, iOS, and Android

“Their desire to exceed expectations rather than just follow orders sets them apart. They go out of their way to improve the engineering, not just ship the feature.”

Director of Engineering, Field service platformRead the case study →

Wednesday provides VP Engineering references from every major engagement. 30 minutes gets you the reference list and a clear picture of what a Wednesday engagement delivers.

Talk to a reference →

The automated testing requirement

Automated testing is not a quality enhancement for a mobile vendor doing core infrastructure work. It is a prerequisite.

A mobile app serving an enterprise workforce has users who depend on it for their daily work. A field technician whose job logging app crashes loses documentation. A nurse whose clinical app fails during a shift loses patient records. A driver whose dispatch app goes down during a route loses their assignment. These are not edge cases. They are the normal consequence of shipping a defect to a user who is in the middle of using the app.

The only reliable way to prevent defects from reaching production is a test gate that runs before every release. Not manual QA — manual QA misses things, and its coverage degrades as the app grows. Automated tests that run on every commit, covering the paths users actually follow, and blocking deployment when they fail.

Wednesday's automated testing standard includes screenshot regression testing — automated visual comparison of every screen on every device configuration before a release goes out. A UI change that breaks layout on a specific Android device size will fail the screenshot comparison and block the release before a user sees it. Manual QA would catch it only if the tester happened to test that device configuration on that day.

Ask any vendor you are evaluating: what percentage of your releases in the last 12 months required a hotfix within seven days? Above 20 percent means the test gate is not working. Above 35 percent means the test gate does not exist in a meaningful form.

How to run a structured pilot

A pilot that does not have defined success criteria before it starts is not a pilot. It is a free sample. Free samples do not give you the evidence you need to decide whether to trust a vendor with core infrastructure.

Structure a four-week pilot with:

A bounded real scope. One feature or one clearly defined subsystem. Not a proof of concept. Not a tutorial. A real piece of work that will ship to real users if it passes the quality bar.

Defined success criteria. Before work begins: what does on-time look like, what is the acceptable defect rate, how many times is the VP Engineering allowed to be pulled in before the engagement fails the pilot. Write these down and share them with the vendor before day one.

A delivery review at the end. Not just "did it ship?" Review the code. Review the test coverage. Review how the vendor communicated during the four weeks. Look at whether they surfaced blockers proactively or waited until they became delays.

A deliberate decision. After the pilot, decide explicitly: does this vendor's output, process, and communication quality meet the bar required for core infrastructure work? The decision should be binary. A pilot that results in "let's see how it goes" is not a pilot — it is a drift.

The vendor evaluation scorecard

Score each vendor across seven dimensions before a final decision. Each runs from one (does not meet bar) to four (exceeds bar):

Dimension	What you are measuring
Engineering depth	Can the team explain architectural decisions and trade-offs without prompting?
Test coverage	Do they have automated tests and a CI gate?
Self-management	Does the engagement require VP Engineering involvement beyond 2 hours per week?
Communication	Are status updates proactive, or do you chase them?
Track record	Can they provide VP Engineering references from enterprise work?
Incident response	Do they have documented incident reports from prior engagements?
Trust earned	What did prior clients trust them with after the initial engagement?

A vendor that scores three or above on all seven is ready for core infrastructure. A vendor with a single one is not, regardless of how the other six score. The one is a known failure mode.

Frequently asked questions

The case studies show real deliverables, real timelines, and what Wednesday was trusted with after the initial engagement proved the quality bar.

See what Wednesday delivers →

About the author

Ali Hafizji

LinkedIn →

CEO & Co-founder, Wednesday Solutions

Ali has been building mobile apps for 15 years and is the author of two published iOS development books. He has shipped Flutter, iOS, and Android products across travel, gig economy, and ecommerce, and leads enterprise AI enablement across Wednesday engagements. He co-founded Wednesday Solutions and architects the AI-native engineering workflow the team ships with on every engagement.

30 minutes with an engineer. You leave with a squad shape, a monthly cost, and a start date.

Get your start date →

4.8 on Clutch

4x faster with AI2x fewer crashes100% money back

Keep reading

May 2026 · 7 min read