#MCP – garagegeekblog

June 4, 2026

The Plumbing Under the Hood: RAG, MCP and the Architecture Nobody Explains

A diagram illustrating the architecture of a large language model (LLM) with connections to various systems including CRM, ERP, HR, and SharePoint, displayed on a blueprint-style background.

I’m an under the hood type of guy. I hear high-level fluff and I just turn off. I need more. I need to be able to visualise how things work — and the effects of implementation. I guess that’s the Solution Architect in me. Years of seeing projects go south. Experience that says it’s just not that simple.

I learned a long time ago: if you want something doing, do it yourself.

So here it is. No fluff, no hand-waving. The no-nonsense guide to what RAG and MCP actually are, how they work, and why the distinction matters more than most people realise. Enjoy.

The Problem Every Enterprise AI Deployment Hits

Large language models are genuinely extraordinary. The breadth of knowledge, the reasoning capability, the ability to synthesise and explain — it’s real, and it’s useful. But they have a fundamental constraint that every organisation hits the moment they try to deploy one seriously.

They are frozen.

An LLM is trained on a vast corpus of data up to a point in time, and then the weights are fixed. The model doesn’t know what happened last Tuesday. It doesn’t know your organisation’s processes, your customer contracts, your current pipeline, or the policy document your HR team updated this morning. It is, for all its capability, a brilliant mind in a sealed room.

Every enterprise AI deployment is therefore really solving one problem: how do we get relevant, current, organisational knowledge into the model’s hands at the moment it needs to answer?

Two main solutions emerged. They look similar on the surface. They are fundamentally different underneath.

RAG: The Indexed Snapshot

RAG stands for Retrieval Augmented Generation. The name is less important than the mechanism.

Imagine you have a large knowledge base — policy documents, product guides, training materials, technical specifications. RAG takes all of that content and processes it in advance. Each document gets broken into chunks. Each chunk gets converted into a vector embedding — a numerical representation of the meaning of that text, not just its keywords. Those embeddings get stored in a vector database.

When a user asks a question, the question itself gets converted into a vector using the same method. The system then searches the database for the chunks whose meaning is closest to the meaning of the question — semantic similarity, not keyword matching. The most relevant chunks get retrieved and placed into the model’s context window alongside the original question. The model answers using that retrieved material as its working context.

Think of it as a library. Brilliantly organised, perfectly indexed, searchable by meaning rather than title. You walk in, the system finds the most relevant books, opens them to the right pages, and hands them to the model before it answers.

It’s powerful. For stable, curated knowledge bases it works extremely well.

But it has a ceiling, and the ceiling matters.

The library was shelved at a point in time. The moment your source documents change, your index is stale until you re-embed. And the quality of retrieval is entirely dependent on the quality of what went in. Poorly structured documents, inconsistent language, missing metadata — the embeddings become noisy and retrieval underperforms. The foundational principle holds here as firmly as anywhere in AI: weak data quality at the input stage leads to flawed outputs downstream. RAG doesn’t solve a data quality problem. It inherits it.

MCP: The Living Plumbing

MCP — the Model Context Protocol — is a different kind of answer to the same problem. And understanding the difference is where the real business thinking begins.

MCP doesn’t retrieve from a pre-built index. It connects the model to live systems through their APIs — and queries them in real time, at the moment of the conversation.

Here’s what that means practically. Your SharePoint isn’t indexed in advance — the model calls it directly and gets back whatever is there right now, including the contract template someone updated this morning. Your CRM isn’t embedded into vectors — the model queries it and sees the deal that moved stage an hour ago. Your HR system, your procurement platform, your service desk — all of them accessible, all of them current, all of them live.

The model doesn’t see a snapshot of your organisation. It sees your organisation as it actually is, right now.

And here is the point that changes how you should think about this entirely.

Most enterprise knowledge isn’t in one place. It never has been. It’s fragmented across Salesforce and SAP, ServiceNow and SharePoint, HR platforms and finance systems and procurement tools. Getting RAG to span those systems requires significant data engineering effort — ingesting, normalising, embedding, maintaining. It’s achievable, but it’s heavy.

MCP connects to all of them. Through their APIs. Simultaneously. The model becomes a single conversational interface across the entire technology estate — not just one knowledge base, but the living information fabric of the organisation.

That is not a chatbot connected to some documents. That is a fundamentally different proposition.

Not Competitors — Different Layers

It would be tempting to read this as RAG versus MCP. It isn’t.

They solve overlapping problems at different layers and with different trade-offs. RAG is the right tool for large, stable knowledge corpora where semantic similarity search matters — where you need the model to find relevant material even when the exact words don’t appear in the query. MCP is the right tool where data is live, dynamic, and distributed across operational systems.

And they can work together. A well-architected system might use MCP as the orchestration layer — the model deciding which tools to call — while one of those tools triggers a RAG pipeline for a specific stable knowledge base. The plumbing and the library, working in concert.

The practical guidance is straightforward. Start with MCP. It’s the lower point of entry — no vector infrastructure to provision, no embedding pipelines to build and maintain, no index to keep fresh. You’re connecting to systems and APIs you already have. Reach for RAG when you’ve hit the ceiling — when the corpus is large, messy, and semantic retrieval across unstructured content becomes essential.

Start simple. Earn the complexity.

Before You Lay The Plumbing — What Nobody Tells You

The pitch for both RAG and MCP is compelling. The reality, as always, has a few sharp edges worth knowing about before you commit.

RAG brings infrastructure with it. RAG isn’t just a software pattern you switch on. Behind every vector database is a compute and storage requirement that needs provisioning, maintaining, and scaling as your knowledge base grows. Embedding pipelines need to run continuously — every time source content changes, chunks need re-processing and re-indexing or your library goes stale. For organisations already managing data centre complexity, this is a real cost conversation that rarely appears in the vendor presentation.

MCP makes your legacy systems load-bearing. MCP’s power is connecting to live systems. But those live systems are now dependencies. The legacy HR platform with the flaky API. The procurement system that slows under load. The CRM with three years of inconsistent data entry. Once the LLM is reaching across your technology estate, it is only as reliable as the weakest system it touches. A timeout, a bad API response, a data quality problem in one system degrades the entire interface. What felt like a peripheral legacy problem just became front and centre.

Governance and security are not optional extras. When a model can traverse your entire technology estate — reading CRM data, querying HR systems, pulling procurement approvals — your entire technology estate needs to be ready for that conversation. Access controls, data classification, audit trails, API security, compliance boundaries. These cannot be bolted on after deployment. They need to be designed in from the start. MCP without a holistic governance and security view isn’t just risky. It’s an exposed surface at scale.

This is AI Reality. The plumbing is powerful. Lay it properly.

The Interface, The Plumbing, The Flow

Here is the frame I want to leave you with — because it’s the one that changes how you brief a customer, evaluate a vendor, or think about your own AI roadmap.

LLMs are becoming the interface to information. Not a search bar, not a dashboard, not a report. A conversational, reasoning interface that sits in front of your organisation’s entire data landscape and makes it accessible in plain language.

MCP is the plumbing. The connective tissue that links the interface to the living systems underneath — the CRM, the ERP, the HR platform, the document store, the data warehouse. Without the plumbing, the interface has nothing to work with. With it, the interface can see everything.

And once you have an interface and plumbing, something else becomes possible.

Agents.

Not models that answer questions. Models that act. That move through systems, make decisions, complete workflows, and hand off to humans at exactly the right moment. Agents ride the pipelines that MCP creates and turn information flow into work getting done.

That’s where this goes. And that’s what the next post is about.

Next: The Agentic Leap — when AI stops answering and starts acting.

The Interface. The Plumbing. The Flow.

	Unlocking Enterprise… on The Plumbing Under the Hood: R…
	Optimize Dell PowerE… on LLM Sizing 101 – Part 2: From…
	Optimize Dell PowerE… on LLM Sizing 101 – Part 1:…
	Understanding AI Pro… on In Praise of Boring, Everyday…
	Understanding AI Pro… on Garbage In, Expensive Garbage…