Writing

What Good Mobile Development Actually Looks Like, So You Know When Yours Is Broken

Most CTOs can tell something is wrong with their mobile vendor. Fewer can say what right looks like. These are the six benchmarks.

Bhavesh PawarBhavesh Pawar · Technical Lead, Wednesday Solutions
7 min read·Published Feb 26, 2026·Updated Feb 26, 2026
4xfaster with AI
2xfewer crashes
10xmore work, same cost
4.8on Clutch
Trusted by teams atAmerican ExpressVisaDiscoverEYSmarshKalshiBuildOps

Most CTOs can tell when something is wrong with their mobile vendor. Releases are slow, quality is inconsistent, communication breaks down before deadlines. What is harder to articulate is what the right baseline looks like — which makes it difficult to have a productive conversation with a vendor about performance, and difficult to evaluate a new vendor with confidence.

The six benchmarks below are what a well-run enterprise mobile engagement looks like. They are not ideal-case numbers. They are the standard a competent vendor maintains consistently.

Key benchmarks

Release cadence: weekly or biweekly. Monthly or longer without a specific external reason is a process problem.

Estimation accuracy: within 20 percent variance on most features. Consistent variance above 40 percent indicates broken scoping.

Communication: problems surface with two or more weeks of lead time. Updates arrive without being requested.

Quality: 99.5 percent or higher crash-free sessions for mature apps. Bug density trending down, not stable or growing.

Team stability: same core engineers for six or more months. No silent team changes.

Visibility: you know delivery status without asking. You do not find out about problems through your users.

Why the benchmark matters

A vendor assessment without a benchmark is a gut check. A vendor assessment with a benchmark is a comparison against an observable standard. The difference matters when you are trying to decide whether to have a performance conversation, whether to extend a contract, or whether to run a competitive evaluation.

The benchmarks below are drawn from Wednesday's experience across more than 50 enterprise mobile engagements, including assessments of engagements taken over from other vendors. They represent what a professional mobile development team delivers when it is operating correctly — not the ceiling, but the floor.

Release cadence

Good: weekly releases to the App Store or Play Store, or biweekly with a clear release calendar communicated in advance.

Acceptable: monthly releases for highly complex features with genuine regulatory or integration constraints that prevent faster cycles.

Problem: longer than monthly without a specific documented reason, or a release schedule that shifts regularly without explanation.

The reason release cadence is a quality signal — not just a speed signal — is that the discipline required to release weekly is the same discipline that produces reliable software. Automated testing, short code review cycles, clean handoffs between QA and deployment. Teams that cannot release on a weekly cadence are usually being slowed by a manual process that a mature team replaces with tooling. The slowness and the quality problems share the same root cause.

One benchmark worth tracking: how many releases were submitted versus how many passed App Store review on the first attempt. A high rejection rate indicates the team is not running pre-submission compliance checks. App Store rejection delays are not Apple's problem — they are a process gap on the vendor's side.

Estimation accuracy

Good: within 20 percent variance on most features. Larger variance on features with genuinely unknown third-party dependencies, surfaced in advance.

Acceptable: 20 to 40 percent variance occasionally, with a clear explanation tied to a specific unexpected complexity, not a pattern.

Problem: consistent variance above 40 percent across multiple features and multiple cycles. Estimates that always expand but never contract.

Estimation is not about predicting the future with precision. It is about having a scoping process that identifies the significant unknowns before the work starts. A vendor that consistently underestimates has either not done the scoping work, or has scoped to win the business rather than to plan the delivery. Both are the vendor's problem, not a natural feature of mobile complexity.

Ask your vendor for their estimation accuracy history across the last six months. A vendor that tracks this number and can share it has a mature delivery process. A vendor that does not track it cannot tell you whether it is good or bad.

Communication standard

Good: risk surfaces two or more weeks before it becomes a miss. Written weekly update, not just a meeting. Problems come with options attached, not just news.

Acceptable: occasional late-surfacing of risk on genuinely unexpected issues, with a clear explanation.

Problem: you find out about deadline misses on the day they were due. Status updates require chasing. Escalations arrive without options.

The communication standard that separates strong vendors from weak ones is not the volume of updates — it is the quality of risk surfacing. A vendor that tells you two weeks in advance that a feature is at risk gives you options. You can adjust scope, extend the deadline, or reprioritize. A vendor that tells you on the day gives you none.

The format of the weekly update also matters. An update that only reports what shipped is a backward-looking document. The useful part of a weekly update is the forward-looking section: what is at risk next week, what decisions or inputs are needed from your side, and what the vendor is watching. If your vendor's update never has anything in the at-risk section, ask why.

Quality floor

Good: 99.5 percent or higher crash-free sessions for a mature app in steady-state operation. Bug density per release trending downward over a six-month period. Zero critical bugs reaching users without a same-day detection and response.

Acceptable: 99 percent crash-free sessions during a major rebuild or integration phase, with a clear path back to 99.5 percent.

Problem: crash-free rate below 99 percent on a mature app with no active architectural changes. Bug density stable or growing. Critical bugs discovered through user reviews rather than internal monitoring.

The crash-free rate is the single most observable quality metric and the one most vendors are reluctant to share unprompted. Ask for it directly. If your vendor does not have it, ask why crash reporting has not been instrumented. If they are reluctant to share it, that reluctance is data.

Bug density is a less commonly tracked metric but a more revealing one. A team that is genuinely improving its process produces fewer bugs per release cycle over time. A team that is patching problems without addressing root causes maintains a steady bug rate or sees it grow. Tracking this over six months gives you a clear picture of whether the underlying quality trajectory is moving in the right direction.

Team stability

Good: the same core engineers on your product for six or more months. New engineers shadowed before owning work independently. Team changes communicated proactively before they happen.

Acceptable: one team change in a 12-month engagement, communicated in advance, with a planned handover.

Problem: team changes without notification. Engineers presented during the sales process replaced by more junior engineers after contract signing. Core contacts changing every quarter.

Team stability matters because context is expensive to rebuild. An engineer who has worked on your product for six months understands its history, its constraints, and its non-obvious decisions. Replacing that engineer with someone new costs two to four weeks of ramp-up time, during which delivery slows and the risk of introducing problems increases. Vendors that rotate staff frequently are spreading context across multiple engagements at your expense.

The most reliable way to check this is to ask your vendor directly: who is working on my product today, and is that the same team that was working on it three months ago? The answer will tell you whether the team you are paying for is the team you are getting.

Delivery visibility

Good: you know delivery status without asking. Release notes arrive with every submission, specific and detailed. You find out about problems before your users do.

Acceptable: status available on request within 24 hours.

Problem: status requires repeated chasing. Release notes are generic or absent. You find out about problems through user reviews or support tickets.

Visibility is a process, not a technology. A vendor with good delivery visibility has instrumented the app with crash reporting, has a monitoring process that catches issues before users do, and has a reporting cadence that keeps you informed without requiring you to ask. The absence of these is not a tooling gap — it is a decision about whether client visibility is worth the investment.

The simplest test: when was the last time your vendor told you about a problem before you asked? If you cannot remember an instance, you do not have visibility — you have a vendor that manages information rather than sharing it.

How to score your engagement

Score your current vendor against the six benchmarks above. Give one point for each benchmark where the "good" standard is met consistently.

5 to 6 points: you have a solid vendor. The relationship is worth protecting. If one or two benchmarks are below standard, raise them directly — a vendor performing well on four or five will typically respond well to specific feedback.

3 to 4 points: you have a vendor with specific fixable problems. Have a direct conversation about the two or three benchmarks that are below standard. Set a 30-day window to assess whether performance changes. If it does not, treat this the same as a lower score.

0 to 2 points: the vendor is not performing at the standard required for an enterprise mobile engagement. A conversation may produce short-term improvement, but the underlying delivery process is not fit for purpose. A competitive evaluation is warranted.

The benchmarks above are not a judgment — they are a description of what a professional vendor delivers consistently. If your current engagement is below standard on most of them, you are not dealing with a difficult project. You are dealing with a vendor that has not built the processes required to deliver at enterprise scale.

If you want a second opinion on how your current vendor scores against these benchmarks, a 30-minute call with a Wednesday engineer covers the assessment.

Book my call

Frequently asked questions

The writing archive has vendor comparison guides, cost benchmarks, and decision frameworks for every stage of the enterprise mobile buying process.

Read more decision guides

About the author

Bhavesh Pawar

Bhavesh Pawar

LinkedIn →

Technical Lead, Wednesday Solutions

Bhavesh is a Technical Lead at Wednesday Solutions with hands-on depth across React Native, iOS, Android, and Flutter. He has shipped mobile products and enterprise AI solutions across edtech, entertainment, and medtech, and reviews architecture across Wednesday engagements.

Four weeks from this call, a Wednesday squad is shipping your mobile app. 30 minutes confirms the team shape and start date.

Get your start date
4.8 on Clutch
4x faster with AI2x fewer crashes100% money back

Shipped for enterprise and growth teams across US, Europe, and Asia

American Express
Visa
Discover
EY
Smarsh
Kalshi
BuildOps
Ninjavan
Kotak Securities
Rapido
PharmEasy
PayU
Simpl
Docon
Nymble
SpotAI
Zalora
Velotio
Capital Float
Buildd
Kunai
Kalsi
American Express
Visa
Discover
EY
Smarsh
Kalshi
BuildOps
Ninjavan
Kotak Securities
Rapido
PharmEasy
PayU
Simpl
Docon
Nymble
SpotAI
Zalora
Velotio
Capital Float
Buildd
Kunai
Kalsi
American Express
Visa
Discover
EY
Smarsh
Kalshi
BuildOps
Ninjavan
Kotak Securities
Rapido
PharmEasy
PayU
Simpl
Docon
Nymble
SpotAI
Zalora
Velotio
Capital Float
Buildd
Kunai
Kalsi