Click for full blog post: The Plumbing Under the Hood: RAG, MCP and the Architecture Nobody Explains

Pip: If you’ve ever nodded along while someone explained AI architecture and understood roughly zero percent of it, garagegeek’s recent posts are for you — and honestly, maybe for that someone too.

Mara: This episode covers the infrastructure behind enterprise AI: how models actually get access to current information, and what that plumbing looks like in practice.

Pip: Let’s start with the architecture itself.

The Plumbing Under the Hood: RAG, MCP and the Architecture Nobody Explains

Mara: The core problem the post is solving is this: large language models are frozen. They’re trained up to a point in time, and then that’s it — they don’t know what changed this morning, or what’s in your CRM, or what your HR team updated last week.

Pip: The post puts it plainly: “An LLM is, for all its capability, a brilliant mind in a sealed room.” Every enterprise deployment is essentially one long attempt to pass notes through the door.

Mara: Right, and two distinct approaches emerged to solve that. The first is RAG — Retrieval Augmented Generation. Documents get chunked, converted into vector embeddings, and stored. When a question comes in, the system finds the chunks whose meaning is closest to the question and hands them to the model as context.

Pip: So it’s a very well-organized library. The catch being the library was shelved at a point in time.

Mara: Exactly. The post is direct about the ceiling: if source documents change, the index goes stale until you re-embed. And there’s a data quality dependency that doesn’t go away — “weak data quality at the input stage leads to flawed outputs downstream. RAG doesn’t solve a data quality problem. It inherits it.”

Pip: That line should probably be laminated and handed out at every vendor presentation.

Mara: The second approach is MCP — the Model Context Protocol — and it works differently. Instead of a pre-built index, the model connects to live systems through their APIs and queries them in real time. SharePoint, CRM, HR platforms, procurement tools — current, not cached.

Pip: The upshot is the model stops seeing a snapshot of your organisation and starts seeing it as it actually is right now. Which is a meaningfully different thing.

Mara: The post is also clear these aren’t competitors. RAG handles large, stable knowledge corpora well. MCP handles live, distributed operational data. A well-architected system can use both — MCP as the orchestration layer, with RAG as one of the tools it calls for specific unstructured content.

Pip: And the practical advice is refreshingly un-hyped: start with MCP, no vector infrastructure to provision, no embedding pipelines to maintain. Reach for RAG when semantic retrieval across messy unstructured content becomes essential.

Mara: There’s a sharp section on what nobody tells you before you commit. RAG brings real infrastructure costs — compute, storage, continuous re-embedding. MCP makes your legacy systems load-bearing: a flaky API or three years of inconsistent CRM data entry becomes a front-and-centre problem, not a peripheral one. Governance and security have to be designed in from the start, not added after.

Pip: The post closes by pointing somewhere interesting — once you have the interface and the plumbing, agents become possible. Models that don’t just answer, but act.

Mara: That’s the thread the next piece picks up.


Pip: The sealed-room framing is the one that sticks — capable model, no current information, and everything that follows is just different ways of passing notes.

Mara: And the note about agents acting rather than answering suggests the architecture conversation is only getting more consequential from here.

Leave a Reply