Custom AI in 2026: four buy signals, three build signals

If you have been pitched a custom AI build and an off-the-shelf agent platform in the same quarter, you are not imagining the whiplash. The two proposals usually disagree about scope, cost, timeline, and what the work even is. The honest May 2026 answer to which one fits is buy first, build only with reason, and the reasons are narrower than most agency proposals (including the ones we send) admit.

This piece sets out the seven signals we walk operators through when the question lands on our desk: four that say buy a platform this quarter, three that say a custom build is genuinely the right call. It is written for the person who has to defend the decision in a board meeting, not for the search engine.

The market flipped in 2025, and most agency proposals have not caught up

In 2023, "we need an AI capability" usually meant "we will build it". The off-the-shelf tooling was thin, the platforms were demo-grade, and a custom build was a defensible default even for relatively generic workflows. That world is gone.

Three datapoints, all from 2025 or 2026, set the new baseline.

First, MIT's NANDA initiative published "The GenAI Divide: State of AI in Business 2025" (Fortune coverage, August 2025: https://fortune.com/2025/08/18/mit-report-95-percent-generative-ai-pilots-at-companies-failing-cfo/). The headline finding most people quote is that 95 percent of generative AI pilots stall. The finding that matters more for this question is buried further in: purchasing AI tools from specialised vendors and building partnerships succeeded about 67 percent of the time in the study, while internal builds succeeded only one-third as often.

Second, Forrester predicts 75 percent of companies attempting to build their own agentic systems will fail (cited in beam.ai's 2026 enterprise AI roundup: https://beam.ai/agentic-insights/the-great-ai-flip-why-76-of-enterprises-stopped-building-ai-in-house, January 2026). The reason given is not laziness or cost: it is that production agentic systems need diverse models, mature RAG (retrieval-augmented generation) stacks, advanced data architectures, and niche evaluation expertise, and most mid-market firms do not have that bench.

Third, the share of enterprise AI use cases deployed via third-party solutions sits at roughly 76 percent in early 2026, up from about a 50/50 split two years earlier (same beam.ai source, January 2026). The flip is not subtle.

That is the macro picture. None of it says custom AI is dead. It says the base case changed: a mid-market operator in 2026 should start from "what platform already does this" rather than "what would we build". Most of the agency proposals still being sent default to the 2023 shape. Read them with the new baseline in mind.

Four signals that say buy

If three of the four below hold, custom is almost always the wrong call this quarter. The platform option is not perfect, but it will get you to live faster, at a fraction of the cost, and the maintenance is somebody else's job.

Signal one: a published platform already covers the workflow shape. For customer service agents, Sierra has Fortune 50 reference customers and contracts that typically start around USD 150,000 per year (Quiq pricing analysis, January 2026: https://quiq.com/blog/sierra-ai-pricing/). For an internal LLM orchestration and observability layer, Vellum's paid plans start at USD 25 per month with a free startup tier (ZenML pricing guide, March 2026: https://www.zenml.io/blog/vellum-ai-pricing). For agentic workflows that have to portably run across model vendors, Anthropic shipped Agent Skills as an open standard in 2025 and OpenAI quietly adopted a structurally identical layer in its Agents SDK in 2026 (VentureBeat, April 2026: https://venturebeat.com/technology/anthropic-launches-enterprise-agent-skills-and-opens-the-standard). If your workflow is a recognisable shape (customer service, document capture and approval, internal analytics, sales operations), the platform layer has caught up to where you would have wanted to land anyway.

Signal two: data sensitivity is normal. "Normal" means it is not patient data, not unpublished financial records of a regulated entity, not source code of your core product, not customer data that contractually cannot leave a specific geography. If your workflow runs on data that is sensitive in the everyday business sense but not in the regulated sense, a platform with standard SOC 2 and GDPR (General Data Protection Regulation) controls clears the bar. The EU AI Act's full applicability date is 2 August 2026 (Holland and Knight, April 2026: https://www.hklaw.com/en/insights/publications/2026/04/us-companies-face-eu-ai-acts-possible-august-2026-compliance-deadline), but for most non-high-risk use cases its obligations attach to providers and deployers regardless of build-or-buy, and good platform vendors are further along on documentation than most in-house builds will be by August.

Signal three: time-to-value matters more than per-token economics. The scale crossover where a custom build wins on unit economics sits roughly at one million conversations per year for a typical chat or agent workload (Scale's build-vs-buy guide, 2026: https://scale.com/guides/build-vs-buy). Below that, the maintenance and infrastructure burden of a custom build eats whatever per-token savings it nominally produces. Above it, the maths can flip. Most mid-market firms running back-office workflows are nowhere near the crossover.

Signal four: you do not have ML engineering capacity in the building. Custom AI development is not custom application development with an LLM call sprinkled in. It needs model selection, evaluation harnesses, prompt and tool-call regression suites, observability, and an on-call rotation that understands what a confidence-score regression actually means. Roughly 20 to 30 percent of the initial build cost shows up as annual maintenance to manage model drift and the MLOps surface (Riseup Labs AI agent cost breakdown, February 2026: https://riseuplabs.com/ai-agent-development-cost/). If you are hiring the first ML engineer to maintain the system you are about to commission, the platform option is doing your hiring plan a favour.

Three signals that say build

The build case is real. It is narrower than most agency proposals suggest, but it exists, and skipping it for the wrong reasons leaves measurable value on the table.

Signal one: the workflow is core competitive differentiation. The fitness test is uncomfortable. If a competitor in your market could license the same platform tomorrow and reach feature parity by the end of the quarter, the workflow is not core, it is generic. Generic workflows belong on platforms. Core workflows, where the system embeds proprietary process knowledge, proprietary data, or a specific take on the customer relationship that the platform cannot model, are the genuine build territory. If you cannot say in one sentence why a platform copy of this would not be good enough, you are looking at a generic workflow.

Signal two: data residency or regulation rules out multi-tenant SaaS. Healthcare contact centres adding voice AI inherit HIPAA (Health Insurance Portability and Accountability Act) obligations on top of AI rules. Financial services firms deploying AI for lending decisions inherit existing lending regulations alongside AI Act obligations. Public sector and defence work routinely requires UK or EU-only data residency that mainstream multi-tenant platforms cannot guarantee for primary workflows. (Sources: Workstreet's EU AI Act guide for SaaS, 2026: https://www.workstreet.com/blog/eu-ai-act-compliance; Parloa AI privacy summary, 2026: https://www.parloa.com/blog/AI-privacy-2026/). In those settings, a custom build on a dedicated tenant (or on-prem) is not a luxury. It is the only deployment that survives a procurement review.

Signal three: you have crossed the unit-economics threshold. This is the seven-figure-conversation case (or its equivalent: very long context windows, high-frequency tool use, latency floors that platforms cannot hit). The maths gets specific. If you are processing AI-driven steps at the scale where a platform's metered pricing approaches a six-figure annual line, a custom build on direct model APIs starts to make economic sense even before you count any other benefit. Below that scale, the maths does not work.

A useful sanity check: if you cannot tick at least two of those three signals, the build case is probably not real. We turn down build proposals on this test more often than we accept them.

The platform failure case (counter-thesis)

We owe the operator the other side, because the buy-first framing has a real failure mode and the post would be dishonest without it.

The pattern looks like this. An operator picks a published agent platform on a demo that handles a clean test transcript well. The platform vendor's pricing comes in at, say, USD 200,000 per year all-in, which compares favourably against an agency build quote of GBP 250,000 plus 20 percent annual maintenance. The integration project starts. Six months in, the platform is live in a corner of the business, but the unique-to-your-firm parts of the workflow (the data model, the exception handling, the integrations with the practice management or ERP system, the audit trail your regulator wants) sit in glue code that you wrote, not in the platform. The platform vendor pushes a roadmap that changes the contract surface twice a year, your glue code breaks, and the annual cost has crept past the original build quote because of integration consultancy days nobody priced in.

This is not theoretical, and it is the version of the build-vs-buy decision that operators most often regret in retrospect, because the platform's brochure-page demos and the operator's actual workflow are not the same artefact. The MIT NANDA report flagged it directly: generic tools excel for individuals because of flexibility, but they stall in enterprise use since they do not learn from or adapt to specific workflows (Fortune coverage of MIT NANDA, August 2025).

How to spot the failure case in scoping. Three questions, asked of any platform vendor before signing:

Show me three named reference customers in our size range and sector. (If the references are all USD 5bn-plus public companies, the platform may not fit a 50-person firm's reality.)
What proportion of customers at our size are using the feature we are buying you for, and not just the brand-name capability? (If your specific use case is in beta or "coming this year", you are paying for a roadmap, not a product.)
What is the integration consultancy day-rate from the vendor's professional services team, and how many days do similar customers report needing in year one? (If the answer is "it depends" with no range, you have an unbounded integration project.)

The platform option is the right call in 2026 more often than it was in 2023. It is not the right call every time, and a platform purchase that turns into a stalled integration is worse than either the well-scoped build or the well-scoped buy.

What custom AI actually costs in 2026, when it is the right call

If you are in build territory, the honest numbers below are the May 2026 starting point. They are wider ranges than agency rate cards usually present, because the work itself varies wildly.

Scoped proof of concept (one workflow, one model provider, no production hardening): USD 50,000 to USD 150,000, six to ten weeks (Kellton 2026 cost guide: https://www.kellton.com/kellton-tech-blog/custom-ai-development-cost; TechAhead enterprise AI development cost: https://www.techaheadcorp.com/blog/enterprise-ai-development-cost/).

Mid-complexity production build (integrated with one or two existing systems, evaluation harness, MLOps): USD 75,000 to USD 250,000, eight to fourteen weeks (Riseup Labs AI agent cost breakdown, February 2026: https://riseuplabs.com/ai-agent-development-cost/).

Full enterprise deployment with multi-model architecture, organisation-wide rollout, custom LLM components: USD 250,000 to USD 1,000,000-plus, four to nine months. The most common landing zone for full production deployments with enterprise integrations and MLOps pipelines is USD 250,000 to USD 500,000 (TechAhead, 2026).

Blended team rates in the agency market sit at roughly USD 150 to USD 350 per hour in 2026, with senior ML engineers at USD 200 to USD 300 and AI architects at USD 275 to USD 400 (Digital Agency Network AI pricing guide, 2026: https://digitalagencynetwork.com/ai-agency-pricing/). Irish and UK market rates run lower than the US-headline figures in those guides, in the range of EUR 110 to EUR 220 per hour for equivalent roles based on our own bench and the rates we see in competitive tenders.

The hidden costs everyone misses. Data preparation and quality remediation typically eats 15 to 25 percent of total project cost, and up to 30 to 40 percent in data-intensive deployments (Riseup Labs, 2026). Ongoing operational costs (monitoring, retraining, infrastructure scaling) add 15 to 30 percent of the initial build cost per year. If you are budgeting only for the build, you are budgeting for half the actual cost over a three-year horizon.

How long does it take, and what does the process look like

A scoped PoC: six to ten weeks. A mid-complexity production build: three to four months end-to-end. A full enterprise deployment: six to nine months, and longer if a regulatory approval gate sits in the critical path.

The process, where we run it, is six phases. Discovery and scoping (one to two weeks): we map the workflow, the data, the integration surface, and the success metric. Most decisions worth making come out of this phase, including occasionally the decision not to build. Data preparation (two to four weeks, longer if the data is bad): the unsexy phase that determines whether anything else works. Prototype on real data (two to four weeks): a working system on a slice of the production data, evaluated against a metric defined in discovery. Production hardening (three to six weeks): evaluation harness, observability, integrations, security review. Pilot (two to six weeks): live with a defined user group and a measurable outcome. Production rollout and maintenance (ongoing): with a maintenance contract that funds the model-drift surface.

The two phases that go wrong most often are data preparation (because nobody budgets enough time for it) and pilot (because the success metric agreed in discovery was too vague to settle the launch debate). Both are fixable by writing the metric concretely in discovery, before any model selection conversation.

Picking a partner if you decide to build

The signals we look for, when an operator asks us to recommend a partner (sometimes it is us, sometimes it is not):

A track record of shipped production AI, not just pilots. Ask for three live systems with named clients and a description of the production failure mode each one survived. Pilot-heavy portfolios are a yellow flag.

An evaluation-and-observability practice, not a prompt-engineering practice. Ask how they evaluate model changes in production and what their last regression looked like. If the answer is hand-wavy, the production hardening will be too.

Industry fit. AI for healthcare is not AI for retail. The partner should be able to name the regulatory and operational constraints in your sector without prompting.

Realism on cost and timeline. A partner who quotes the lowest number is not doing you a favour. A partner who quotes a range and explains what moves it is.

Honest scoping. The right partner will at least sometimes tell you to buy a platform instead. We turn down build engagements that we think a platform would serve better, and we expect any partner worth working with to do the same.

Where Appify fits

We are an Ireland and UK AI development partner working primarily with mid-market firms (turnover roughly EUR 5m to 100m). Our typical engagements are a scoped PoC or a mid-complexity production build, in healthcare, accounting, logistics, fleet, and retail. We say buy the platform more often than agencies typically do, because we have watched the platform layer mature and the failure rate on internal builds without the right bench is what it is.

If you are sitting on both proposals this quarter and want a second opinion, the first conversation is free. We will tell you which signals we see, what the honest cost and timeline range looks like, and whether we think a platform purchase would get you there faster. Sometimes we are the right partner. Sometimes the answer is the platform and we will say so.

The market flipped in 2025, and most agency proposals have not caught up

Three datapoints, all from 2025 or 2026, set the new baseline.

Four signals that say buy

Three signals that say build

The build case is real. It is narrower than most agency proposals suggest, but it exists, and skipping it for the wrong reasons leaves measurable value on the table.

A useful sanity check: if you cannot tick at least two of those three signals, the build case is probably not real. We turn down build proposals on this test more often than we accept them.

The platform failure case (counter-thesis)

We owe the operator the other side, because the buy-first framing has a real failure mode and the post would be dishonest without it.

How to spot the failure case in scoping. Three questions, asked of any platform vendor before signing:

Show me three named reference customers in our size range and sector. (If the references are all USD 5bn-plus public companies, the platform may not fit a 50-person firm's reality.)
What proportion of customers at our size are using the feature we are buying you for, and not just the brand-name capability? (If your specific use case is in beta or "coming this year", you are paying for a roadmap, not a product.)
What is the integration consultancy day-rate from the vendor's professional services team, and how many days do similar customers report needing in year one? (If the answer is "it depends" with no range, you have an unbounded integration project.)

What custom AI actually costs in 2026, when it is the right call

If you are in build territory, the honest numbers below are the May 2026 starting point. They are wider ranges than agency rate cards usually present, because the work itself varies wildly.

How long does it take, and what does the process look like

Picking a partner if you decide to build

The signals we look for, when an operator asks us to recommend a partner (sometimes it is us, sometimes it is not):

Industry fit. AI for healthcare is not AI for retail. The partner should be able to name the regulatory and operational constraints in your sector without prompting.

Realism on cost and timeline. A partner who quotes the lowest number is not doing you a favour. A partner who quotes a range and explains what moves it is.

Custom AI in 2026: four buy signals, three build signals

The market flipped in 2025, and most agency proposals have not caught up

Four signals that say buy

Three signals that say build

The platform failure case (counter-thesis)

What custom AI actually costs in 2026, when it is the right call

How long does it take, and what does the process look like

Picking a partner if you decide to build

Where Appify fits

Ready to talk?

Related articles

How to add AI to existing software without rebuilding

Honest AI automation ROI: three categories that hold past month six

AI dashboards in 2026: which mode fits, and how to tell

Custom AI in 2026: four buy signals, three build signals

The market flipped in 2025, and most agency proposals have not caught up

Four signals that say buy

Three signals that say build

The platform failure case (counter-thesis)

What custom AI actually costs in 2026, when it is the right call

How long does it take, and what does the process look like

Picking a partner if you decide to build

Where Appify fits

Ready to talk?

Related articles

How to add AI to existing software without rebuilding

Honest AI automation ROI: three categories that hold past month six

AI dashboards in 2026: which mode fits, and how to tell