Before RAG, agents, or long-context: three operator floors that decide deployment
What auditors, the EDPS RAG paper, and a tripling in agentic-retrieval intent in Q1 2026 mean for the deployment-shape decision before retrieval substrate.
For: Heads of Data, CTOs, COOs, InfoSec and Compliance evaluating an enterprise RAG, agentic, or long-context deployment in 2026

"Show me your vector database audit trail" is not a hypothetical future question. Evan Glaser, co-founder of Alongside AI, gave it to InformationWeek on 12 May 2026 as the question legal teams have started asking on every enterprise AI procurement call.
He paired it with a sharper observation: RAG isn't invisible, it's unowned. The system spans legal, information governance, and IT. It is usually built inside AI teams outside any of those control frameworks.
That is the buyer-side moment most mid-market firms are in right now. The engineering team is comparing three substrates: classic Retrieval-Augmented Generation (RAG), agentic-retrieval loops, and the new long-context frontier models. The ground under them is moving. VentureBeat's Q1 2026 RAG Infrastructure Tracker recorded hybrid and agentic retrieval intent tripling from 10.3 to 33.3 percent in a single quarter. 22.2 percent of qualified enterprises now report no production RAG at all. The substrate question is genuinely unstable in May 2026.
The deployment-shape question is not. Three operator-side floors decide the shape regardless of which substrate the engineering team picks. Locking those floors first is the part of the work that does not change when the underlying model does.
This article is about those three floors, what 2026 evidence supports each, and what the substantive counter-argument is. The engineering layer that comes after them is a separate post, rag-is-not-a-search-bar.
The three floors at a glance
- A privacy and residency contract. Where your documents physically live, who can see them, and under what audit and exit terms.
- An audit-ready citation chain. Every load-bearing answer points to a verifiable internal record, with the chain robust enough that an auditor or regulator will accept it as evidence.
- A time-to-cited-answer Service Level Objective (SLO). How fast a knowledge worker actually gets a grounded, sourced answer to a real working question.
Lock those three floors and the retrieval substrate becomes an engineering choice the team can swap as the technology moves. Leave them unlocked and you are underwriting the substrate's behaviour as enterprise policy.
Floor 1: the privacy and residency contract
The privacy floor is the one that hardened most over the last twelve months, particularly for firms operating in the EU and Ireland.
Start with the European Data Protection Supervisor (EDPS). Its 2025 TechSonar paper on RAG, published as official EDPS guidance, names the specific failure mode: outsourcing the retrieval step creates a General Data Protection Regulation (GDPR) Chapter V cross-border transfer problem that the EDPS describes as "particularly challenging" to remedy through contract alone. The retrieval step touches personal data inside the documents being searched. The controller cannot inspect what the retrieval substrate did with that data once it left the perimeter.
That sits inside a stack:
- EU AI Act, Article 26(7). In force for high-risk system deployers from 2 August 2026. Requires the deployer to perform a Data Protection Impact Assessment (DPIA) using the provider's Article 13 information. When the provider does not expose the retrieval semantics, the DPIA cannot be discharged.
- Digital Operational Resilience Act (DORA), Article 28. Live since January 2025 for financial entities. Imposes third-party Information and Communications Technology (ICT) contractual clauses (exit, audit, sub-outsourcing) that most managed vector-database vendors cannot satisfy without a Bring-Your-Own-Cloud (BYOC) or Hybrid deployment option.
- Central Bank of Ireland. Public position, restated in its AI guidance and reinforced by Statutory Instrument 366 of 2025: firms remain fully accountable for outcomes generated by AI systems, including where the solutions are developed, procured, or operated by third party providers.
The market noticed. The vendors who can satisfy Floor 1 today, broadly, are the ones who shipped a credible perimeter option:
- Pinecone. BYOC generally available; Services Addendum updated 10 February 2026.
- Weaviate. BYOC across Flex ($45/month) and Premium ($400/month) tiers.
- Qdrant. Hybrid Cloud product carries explicit data-isolation language.
- Azure AI Search. Ships Private Link from its Basic tier upward.
The vendors that cannot (Vespa Cloud dedicated, Vertex AI Search at the canonical SKU, Snowflake-Cortex outside the Snowflake estate) are the ones whose public pricing pages were either timed-out or quote-on-request in May 2026. That is itself a signal.
The mid-market test on Floor 1 is procedural, not technical. If a vendor cannot provide:
- a Services Addendum with a defined data residency region,
- a DORA-compatible exit clause, and
- a sub-processor list that names every party your documents transit,
then the privacy floor is not locked. The retrieval quality is a separate conversation that does not happen until the floor is.
Floor 2: audit-ready citations
The citation floor is the one that vendor marketing overpromises on most consistently. Three numbers tell the story.
Hallucination rate. Stanford's RegLab, in a paper published in the Journal of Empirical Legal Studies in April 2025, tested three vendor RAG products that publicly market hallucination-free or citation-grounded behaviour: LexisNexis Lexis+ AI, Thomson Reuters Westlaw AI-Assisted Research, and Practical Law's Ask Practical Law AI. The measured hallucination rate sat between 17 and 33 percent across the three products on auditor-grade legal queries. The gap between vendor promise and measured output is not narrow.
Citation accuracy. A separate problem from hallucination. The ALCE citation-evaluation benchmark (2024) places recall accuracy on grounded citations at 66.1 percent (Cohen kappa 0.525, indicating moderate inter-rater agreement). One in three citations on a grounded answer points at something that does not support the claim.
Domain variance. RAGAS faithfulness scores reported in 2025 failed 83.5 percent of evaluations on FinanceBench against 0.10 percent on PubMedQA. Domain dominates retrieval-evaluation results more than any vendor benchmark page reports.
Regulators are starting to act:
- PCAOB (Public Company Accounting Oversight Board) published a 2025 Spotlight reminding auditors that AI-generated work papers must still trace to verifiable evidence.
- FCA (UK Financial Conduct Authority) issued a research note treating LLM citations as insufficient evidence absent a separate verification process.
- FDA reported public hallucinations from its own internal Elsa AI system.
- Moffatt v. Air Canada (2024). In one Canadian tribunal decision, the company was held liable for an answer its customer-service chatbot generated, on the basis that the chatbot's source was the company's own published policy and the buyer was entitled to rely on the citation.
The precedent value across jurisdictions is uncertain. The direction of regulator attention is not.
The mid-market test on Floor 2 is the artefact your vendor produces when a real auditor asks for the citation chain on a real answer.
- If the artefact is a screenshot of the chat window with a footnote, the floor is not locked.
- If the artefact is a structured log capturing the document identifier, the retrieved chunk, the chunk hash, the timestamp, and the generation parameters, the floor is.
Stanford's measured 17 to 33 percent vendor hallucination rate is the upper bound on the cost of getting this wrong.
Floor 3: the time-to-cited-answer SLO
The third floor is the one most firms skip. Every RAG return-on-investment (ROI) argument hinges on a metric most companies do not measure: the baseline time a knowledge worker takes today to produce a grounded, sourced answer to a question for which the source documents already exist inside the organisation.
The MDPI 2025 systematic review of 63 enterprise RAG studies stated explicitly that the metric heterogeneity across published deployments prevented any meta-analysis of ROI. When the published academic literature cannot aggregate the ROI claims, the vendor case studies certainly cannot.
The point is not that RAG fails to reduce search time. The point is that almost no firm measures the baseline well enough to know whether their own deployment moved the metric, or whether their knowledge workers ignored the RAG answer and went to the same shared drive they used in 2024.
A Service Level Objective (SLO) on time-to-cited-answer is the floor's lock. It needs three components:
- Scope. The population of question types the SLO covers. The vendor can disclaim everything outside that population, but inside it they are accountable.
- Latency target. Time from question to first cited paragraph. Real targets are in the 5-to-15-second range for ambient document corpora.
- Citation-precision threshold. Below this, the SLO is considered missed. Stanford RegLab's 17 to 33 percent failure rate is what untreated procurement looks like.
The SLO is the contract that says "if your deployment cannot do this, we have not bought what was sold."
What the substrate-is-moving counter-argument actually says
The strongest counter to the three-floor framing is that it is too static.
Andrej Karpathy's April 2026 LLM-Wiki essay, picked up by VentureBeat with a wide subsequent reach, reframed single-pass RAG as a stateless and amnesiac re-discovery loop. His proposed direction (a persistent LLM-maintained wiki updated by agentic loops) does not slot under any of the three floors as written.
VentureBeat's Q1 2026 tracker corroborated the movement underneath. Intent toward agentic retrieval tripled in a quarter. More telling: actual paused-or-cancelled RAG programmes concentrated in healthcare, education, and government, which are the same regulated sectors this article is addressed to.
The counter-argument is correct on one thing. The substrate is in flux. Operators who hard-code a 2024-vintage classic-RAG architecture as policy will look foolish within two release cycles.
It does not show that the three floors are wrong. The privacy contract, the citation chain, and the time-to-cited-answer SLO are substrate-independent. They apply equally to a vector-DB RAG, to a long-context loaded-doc model, to an agentic retrieval graph, and to whatever comes after. The point of writing the deployment shape into the procurement contract before the engineering team picks the substrate is precisely that the engineering team can re-pick the substrate later without re-opening the floors.
A reader who buys Karpathy's argument should still lock the floors. A reader who does not should still lock the floors. The floors are what the buyer's job is.
Where this lands
The mid-market firms that get this right in 2026 are the ones who run two rooms. In the first room, Heads of Data, COOs, and Heads of Compliance write the floors. In the second room, the engineering team picks the substrate.
Once the floors are locked, the engineering choice between classic RAG, an agentic loop, or a long-context model reduces to the questions rag-is-not-a-search-bar covers: chunking, retrieval rank quality, reranker selection, and the ingestion-pipeline failure modes that have nothing to do with the LLM.
Conflate the two rooms and you fail in one of two ways:
- Over-buy. Lock in a five-year contract with a vendor whose architecture will look dated in eighteen months.
- Under-buy. Sign on retrieval quality alone, then discover later that the audit trail your compliance team needs was never specified.
The Irish and EU regulatory stack does not give the option of skipping the first room. The market for substrates is moving fast enough that staying out of the second is also a choice.
Productivity is easier than profit, and a locked floor is easier than an unlocked one.
Tagged


