Every founder we talk to right now has the same problem. They have a CEO mandate to "do something with AI," a board that expects an AI roadmap in the next quarter, and a LinkedIn feed full of agencies promising to deliver it. The promises sound identical. The deliverables, when they arrive, very much do not.
The category called "AI agencies" has exploded over the last eighteen months. What used to be a niche of perhaps fifty serious firms globally is now a market of thousands. Some are genuinely capable shops with engineering depth, domain expertise, and a track record of getting AI into production. Many are repackaged digital agencies, automation vendors, or rebranded SEO shops riding the wave. The difference is invisible from a website.
This piece is a working operator's guide to telling them apart. It is written for founders, heads of product, and operators inside scaling companies - the people who have to make the call about which AI agency to hire and live with the consequences.
The pilot-to-production problem is the only problem that matters.
If you take one thing from this article, take this. The single most important question to ask any AI agency is not "what can you build?" It is "what have you put into production, and how is it doing now?"
A working demo is not a deliverable. A pilot with twenty internal users is not a deliverable. A system that has been live for six months, that customer support is now dependent on, that has survived a model upgrade and a security review - that is a deliverable.
An excellent breakdown by xtraSaaS walks through the specific drivers that separate AI projects that ship from the ones that die in the pilot phase. The summary is uncomfortable: most projects fail not because the model is wrong, but because the surrounding system - the data plumbing, the human workflow, the governance - was never built. AI agencies that understand this lead with workflow design. The ones that don't lead with model architecture.
When you are interviewing AI agencies, push every conversation toward production. Ask: Name three systems you have built that are live today. What is the user count, request volume, and uptime? Who at the customer would speak to me about the engagement? What broke during the rollout, and how did you fix it?
The agencies that struggle with these questions are not the ones you want building your AI stack.
The wrong AI agency is more expensive than no AI agency.
There is a real cost to hiring badly here, and it is not just the invoice. A useful piece from Agility Portal lays out the hidden costs of choosing the wrong AI consultancy: the months of internal time burned, the platform commitments that lock you in, the organisational fatigue that makes the second AI initiative harder to launch than the first.
We have seen this play out repeatedly. A series B SaaS company spends £400k on an AI agency that ships a shiny copilot. Six months later, usage is at 3%, the agency has moved on, and the internal team - who never got to influence the design - has lost interest in AI entirely. The next attempt, two years later, has to overcome not just the technical problem but the organisational scar tissue from the first one.
The opportunity cost is worse than the financial cost. In a market where competitors are getting AI into production this year, a failed engagement is a twelve-month setback you cannot recover from.
What good AI agencies actually look like in 2026
A few patterns separate the agencies that ship from the ones that pitch.
They are sector-specialised, not horizontally generic. The strongest AI agencies have gone deep in two or three verticals. They know what an insurance claims workflow looks like, or how a retail merchandising team makes decisions, or where a fintech's compliance team spends its time. Horizontal "we do AI for everyone" pitches are usually a sign that the agency is still figuring out what it does.
They have engineers, not just consultants. A genuine AI agency has senior engineers on staff - not just contracted out - who have shipped AI systems. If the team you meet in the pitch is all consultants and "AI strategists," ask where the engineering capacity comes from. If the answer is "we partner with…" you are hiring a middleman.
They are honest about model choice. The good agencies are model agnostic. They will pick Claude, GPT, Gemini, an open-weight model, or a fine-tuned smaller model based on what the job needs. Agencies that lead with a fixed model preference - or worse, a fixed platform partnership - are usually optimising for their margin, not your outcome.
They talk about evaluation, not just deployment. Production AI requires continuous evaluation: hallucination rates, drift, cost per query, and latency. Ask any AI agency how they monitor systems post-launch. The strong ones have an answer involving real metrics and dashboards. The weak ones treat eval as something the customer will figure out.
They are clear about cost and effort. Genuine enterprise AI engagements are not cheap. A serious agentic build can run from £200k for a focused pilot to several million for a full enterprise deployment. London-based firms like Elsewhen publicly cite engagement bands of £650k–£950k for initial AI activation work and £4M–£6M for full agentic enterprise transformations. Numbers in that range are not eye-watering - they are what real work costs. Agencies quoting a tenth of that for the same scope are either selling automation or about to disappoint you.
A practical evaluation framework for founders
Here is the shortlist process we recommend to founders when hiring AI agencies.
Step one: define the workflow first. Before talking to any agency, write down the specific process you want to improve and the metric you want to move. "Reduce time-to-respond on tier-one support tickets from 4 hours to 30 minutes" is a brief that an agency can deliver against. "Use AI to improve customer experience" is a brief that produces a deck and nothing else.
Step two: shortlist by sector fit. Find three to five AI agencies with demonstrated work in your specific industry. If you cannot find any, your shortlist has just become "agencies with demonstrated work in adjacent industries" - and you should weight the engagement toward more discovery and less ambition.
Step three: pressure-test on production. In the first call, ask the production questions above. The agencies that pass move to a paid discovery sprint. The ones that don't are off the list.
Step four: run a paid two-week discovery. Never sign a full engagement from a sales pitch. Pay for two weeks of senior time to scope the work, define the architecture, and write a delivery plan. This is the single best filter in the entire process. Strong agencies relish a paid discovery. Weak ones try to skip it.
Step five: insist on a phased commercial structure. Do not sign a fixed-price twelve-month deal with anyone. Phase the engagement: discovery, build, pilot, production rollout. Each phase has a go/no-go gate. Each phase has its own commercial terms.
Closing: AI agencies are now a category that requires real diligence
A year ago, hiring an AI agency was a discretionary spend. In 2026, it is a strategic decision on the same level as picking your cloud provider or your core engineering stack. The wrong choice does not just waste money - it shapes what your company can and cannot do for the next two years.
The good news is that the strong AI agencies are getting easier to identify. They have production case studies. They have repeat customers. They talk about workflows and evaluation, not models and demos. They will tell you what they cannot do.
The main concern is that the number of weaker pages is still significantly higher than the number of strong pages, at roughly a ten-to-one ratio. Because these pages all belong to the same website, this can dilute the overall quality signals of the domain.
Choose carefully.
Entrepreneurship