How many Flutter apps should a vendor have in production before we hire them?

At minimum three production Flutter apps currently available in the App Store or Google Play. Not demos, not client apps that were pulled, not internal tools — public apps you can download and test. For enterprise-scale work, ask specifically for apps with 100,000+ downloads or apps in regulated industries, depending on your use case. 60% of agencies claiming enterprise Flutter experience have fewer than three production Flutter apps.

Should we run a paid trial project before signing a full engagement?

A paid trial is the most reliable vendor evaluation method available, but it requires a well-defined, bounded project scope that can be evaluated independently of the main engagement. A two-week paid trial on a specific feature is informative. An open-ended "get started" engagement is not a trial — it is just a slow start to the main project. If a trial is not practical, the eight-question evaluation in this article is the next most reliable filter.

How do we evaluate Flutter vendor claims that we cannot independently verify?

Ask for the Clutch or G2 profile link, not just the testimonial page they control. Clutch and G2 reviews are verified against actual client engagements. Ask for the App Store listing name so you can download and test the app. Ask for crash reporting screenshots from a recent engagement — Crashlytics dashboards are not proprietary and a trustworthy vendor will share them for anonymous clients. Any vendor that refuses all three of these has something to hide.

What does a Flutter vendor scorecard look like in practice?

Score each of the eight questions on a 1-3 scale: 1 for a vague or incomplete answer, 2 for a credible answer without supporting evidence, 3 for a credible answer with verifiable evidence. A total score of 20 or above out of 24 indicates a vendor with genuine enterprise Flutter experience. A score of 15 or below indicates significant gaps. For enterprise-critical Flutter work, only consider vendors scoring 20 or above.

How do we evaluate a Flutter vendor's team composition?

Ask to meet the engineers who will work on your project — not just the sales lead. In a 30-minute technical call with the lead engineer, ask them to describe the widget tree architecture for a complex screen they have built, the state management choice for their most recent enterprise project, and how they handle background sync for offline-first apps. The depth of the answer tells you more than any credential.

Writing

How to Evaluate a Flutter Development Vendor: The Complete Scorecard for US Enterprise 2026

Eight questions separate Flutter vendors with genuine enterprise experience from those who have shipped MVPs. Ask them before signing, not after.

Rameez Khan · Head of Delivery, Wednesday Solutions

9 min read·Published Nov 22, 2025·Updated Nov 22, 2025

4xfaster with AI

2xfewer crashes

10xmore work, same cost

4.8on Clutch

Trusted by teams at

In this article

Why Flutter vendor evaluation is different
Question 1: production apps in the App Store
Question 2: crash-free rate target
Question 3: Flutter updates and breaking changes
Question 4: state management choice and reasoning
Question 5: non-flagship Android device testing
Question 6: release cadence
Question 7: App Store Flutter-specific review issues
Question 8: onboarding timeline
Flutter vendor scorecard table
Frequently asked questions

60% of Flutter agencies claiming enterprise experience have fewer than three production Flutter apps in the App Store. That gap between claim and evidence is the central problem in Flutter vendor evaluation. A vendor that has shipped five or more production Flutter enterprise apps has encountered and solved the failure modes that a vendor with one or two apps has not yet seen. Eight specific questions expose that gap before you sign, not after your deadline slips.

Key findings

60% of agencies claiming enterprise Flutter experience have fewer than three production Flutter apps in the App Store — the most basic verification step eliminates the majority of unqualified vendors.

Wednesday has shipped 10+ Flutter apps to production enterprise clients with a weekly release cadence across all active engagements.

The eight questions in this article produce a vendor evaluation that takes 30 minutes and reliably separates Flutter veterans from Flutter beginners.

Wednesday's Flutter clients rate Wednesday 4.8/5 on Clutch, with reviewers specifically citing delivery on time, proactive problem-finding, and exceeding expectations.

Why Flutter vendor evaluation is different

General mobile vendor evaluation focuses on process, communication, and references. These are necessary but not sufficient for Flutter-specific evaluation. Flutter has enough surface area — the widget rendering model, the Dart language, the Flutter-specific CI/CD requirements, the platform channel architecture for device integrations — that a vendor with strong general mobile experience but limited Flutter depth will underperform on Flutter-specific requirements.

The eight questions below are Flutter-specific. They are not general software quality questions. Each question has a specific technically correct answer — and the answer tells you whether the vendor has actually shipped Flutter at enterprise scale or has delivered enough MVPs to be credible in a sales call without being qualified for an enterprise engagement.

The questions are designed to be asked in a 30-minute technical call with the lead engineer or CTO, not with the sales lead. Sales leads can give credible-sounding answers to technical questions without the technical depth to back them up. The lead engineer's answer to these questions cannot be faked without genuine experience.

Question 1: production apps in the App Store

Ask: "Can you name three production Flutter apps that are currently in the App Store that your team built? I'd like to be able to download and test them."

The correct answer is three specific app names, available on the App Store, with the vendor's name in the developer attribution or a verifiable description of their involvement.

Red flags: naming apps that are no longer available, naming apps where the vendor relationship ended more than 18 months ago, inability to name three apps, mentioning internal tools or white-label apps that cannot be independently verified.

Why it matters: downloading the apps tells you about release freshness (when was the last update?), UI consistency across iOS and Android, performance on your test device, and whether the apps' quality matches the vendor's claims.

Question 2: crash-free rate target

Ask: "What crash-free rate do you target for enterprise Flutter apps, and how do you measure it?"

The correct answer names a specific rate (99%+ for enterprise, with 99.5% as a strong standard), names the measurement tool (Firebase Crashlytics, Sentry, or equivalent), and describes how the reporting is segmented (by device model, by OS version, by app version).

Red flags: "We aim for as few crashes as possible" without a number, reference only to testing without a production monitoring tool, a number below 99% presented as acceptable for enterprise use.

Why it matters: crash-free rate is the most fundamental mobile quality metric. A vendor that does not target a specific rate and measure it in production is not managing quality — they are shipping and hoping.

Question 3: Flutter updates and breaking changes

Ask: "How do you handle Flutter stable releases and breaking changes for active enterprise clients?"

The correct answer describes a process: a post-stable-release evaluation period (typically 2 to 4 weeks), a dependency audit run to identify breaking changes in the plugin ecosystem, automated tests run against the new Flutter version, and a scheduled update included in the regular release cycle.

Red flags: "We update when we need to," "We update as soon as the stable version is released" (without mentioning a dependency audit), "The client decides when to update," no mention of dependency audits.

Why it matters: Flutter stable releases happen roughly twice per year. Each release can break plugins in the dependency tree. An agency without a structured update process will either apply updates that break the app or defer updates indefinitely, accumulating technical debt.

Question 4: state management choice and reasoning

Ask: "What state management do you use for complex enterprise Flutter apps and why?"

The correct answer names Bloc or Riverpod, explains the reasoning (Bloc for complex state with many interdependencies and a need for strict testability, Riverpod for cleaner code with reactive patterns), and acknowledges when simpler approaches are appropriate.

Red flags: recommending Provider for complex enterprise apps ("it's simpler"), mentioning setState as the primary state management for anything beyond local UI state, recommending GetX without acknowledging its testability limitations, inability to name the specific state management approach they use.

Why it matters: the wrong state management choice for app complexity is one of the five diagnosable Flutter enterprise failure modes. A vendor that recommends Provider or setState for a complex enterprise app does not have the depth to sustain quality as the app grows.

Question 5: non-flagship Android device testing

Ask: "How do you test Flutter apps on non-flagship Android devices, and what devices are in your test matrix?"

The correct answer names specific device models beyond the current flagship (Samsung Galaxy A-series, older Pixel models, or equivalent mid-range Android), describes a physical device testing process (not emulator-only), and mentions a specific number of device and OS combinations in the matrix.

Red flags: testing only on current flagship devices, testing only on emulators, inability to name specific non-flagship devices in the test matrix, a device matrix of fewer than eight devices.

Why it matters: enterprise user fleets include mid-range Android devices that behave differently from flagship devices. Flutter rendering bugs, performance issues, and plugin failures on non-flagship Android are the second most common cause of post-launch user complaints after crash-free rate failures.

Question 6: release cadence

Ask: "What is your current release cadence for active Flutter enterprise clients?"

The correct answer is weekly, with a description of how that cadence is maintained: an automated CI/CD pipeline, a release branch process, automated App Store submission, and a process for handling App Store review delays without disrupting the cadence.

Red flags: "When features are ready," "We do monthly releases," "It depends on the client," any answer that treats release cadence as a variable rather than a commitment.

Why it matters: weekly release cadence is the operating standard for enterprise mobile. It requires an automated pipeline, not a manual process. A vendor that cannot describe a weekly cadence process does not have one. A vendor without a weekly release cadence is delivering at slower speed than the market expects.

Question 7: App Store Flutter-specific review issues

Ask: "Have you ever had a Flutter app rejected during App Store review for a Flutter-specific reason? What was the issue and how did you resolve it?"

The correct answer describes a specific experience — a Flutter plugin that was rejected for a native implementation issue, a Flutter app review flag for a specific entitlement requirement, or a review question about the Flutter engine's memory behavior. The resolution demonstrates a working knowledge of Apple's review process for Flutter apps.

A vendor with no App Store Flutter rejection experience has either shipped very few apps or been lucky. The answer should reflect genuine experience with the review process for Flutter-specific issues.

Red flags: "We've never had a rejection," "We submit and it goes through," claiming Flutter rejections are handled identically to any other app rejection, no specific experience with the App Store review process for Flutter.

Why it matters: App Store review for Flutter apps has Flutter-specific issues that appear without warning. An agency that has encountered and resolved them can navigate them on your timeline. One that has not will take longer to diagnose and resolve them.

Question 8: onboarding timeline

Ask: "If we signed tomorrow, when would your engineers be contributing to our weekly release?"

The correct answer is a specific number of weeks — four weeks is the Wednesday standard — with a description of what happens in those weeks: architecture review, development environment setup, CI/CD integration, and the first independently planned and shipped release.

Red flags: "A few months," "It depends on the complexity," "We'd need to do a discovery phase first," no specific commitment.

Why it matters: the onboarding commitment is a delivery commitment. A vendor who cannot commit to a specific onboarding timeline cannot commit to a delivery timeline. The engineering quality of the onboarding — how well the new team understands the architecture and contributes to the release cadence by the end of week four — reflects the overall engineering quality of the engagement.

Flutter vendor scorecard table

Question	3 points (evidence-backed)	2 points (credible, no evidence)	1 point (vague)
Production App Store apps	Names 3+ downloadable apps	Names 3 apps without download evidence	Fewer than 3 or cannot name
Crash-free rate	Specific rate + measurement tool	Specific rate without tool	No specific rate
Flutter update process	Describes specific process	Mentions process without detail	No structured process
State management	Names Bloc/Riverpod with reasoning	Names Bloc/Riverpod without reasoning	Names Provider or setState for complex apps
Device testing matrix	Names specific devices, 12+	Describes device testing without specifics	Emulator only or flagship only
Release cadence	Weekly with automated pipeline	Weekly without pipeline description	Variable or monthly
App Store review experience	Describes specific rejection and fix	Mentions review experience without specifics	"No issues ever"
Onboarding timeline	Specific weeks with process	Specific weeks without process	Vague or months

A score of 20 or above out of 24 indicates a vendor with genuine enterprise Flutter experience. Wednesday scores 24 out of 24 — every question has evidence-backed answers from production deployments.

Want to run this scorecard against Wednesday? Book a 30-minute technical call and ask every question on the list.

Get my recommendation →

Case study — Fashion e-commerce platform

99%crash-free sessions maintained across every release at 20 million users

“We're most impressed with Wednesday Solutions' flexibility and willingness to orient and train their developers before they join our teams.”

Associate Engineering Director, Fashion e-commerce platformRead the case study →

Wednesday's Flutter track record is public and verifiable. The retail engagement — 99% crash-free sessions at 20 million users, maintained across every release — is the most direct answer to every question in this scorecard. The crash-free rate is measured and reported. The release cadence is weekly. The device testing matrix covers 16 combinations. The onboarding timeline is four weeks. The App Store submission is automated. Every claim has evidence.

Run the eight questions on Wednesday in a 30-minute call. Leave with the specific answers that let you make a confident vendor decision.

Book my 30-min call →

4.8 on Clutch

4x faster with AI2x fewer crashes100% money back

Frequently asked questions

More on vendor evaluation? The writing archive has mobile development vendor scorecards, contract frameworks, and Flutter agency comparisons.