
Before we look at what a modern data platform actually looks like, it’s worth pausing to ask: who is it built for?
Platforms don’t create value on their own. People do. And in a world that runs on data and AI, there’s a set of roles that sit at the heart of that value chain — people who design, build, and consume the pipelines that turn raw data into decisions. Understanding who they are, and what they actually need, changes how you think about everything else.
The first person worth introducing is Charlie — the Data Engineer.
Why the Data Engineer exists
Not long ago, data lived inside systems. The database behind the ERP. The CRM. The billing platform. IT looked after servers, storage and availability. Business teams raised tickets when they wanted a report. That model worked when data was a by-product of running the business.
It doesn’t work like that any more.
When analytics needs to be near real-time rather than batched once a month, when AI teams need large consistent training sets and continuous feature feeds, and when data scientists are expected to build models that drive real decisions — someone has to sit between raw application data and the people and systems consuming it.
That’s Charlie.
What Charlie actually does
At its core, Charlie’s job is to move data reliably through its lifecycle — from raw and messy, to clean, modelled, and ready for use. If a dataset appears in a dashboard, a model, or an AI assistant, somewhere behind it is a data engineer making it flow.
Think back to the refinery analogy. If raw data is crude oil, Charlie is the refinery engineer — designing and operating the pipelines, monitoring the process, and making sure the right grade of fuel reaches the right engine at the right time.
In practice that means four things.
First, building and operating pipelines that handle ingestion, transformation, storage, and serving of data — repeatably, reliably, and at production grade. Not one-off scripts that work once and break on a Monday morning.
Second, turning messy source data into trustworthy data products — clean, modelled, documented, and timely. Tables and views that reflect how the business actually thinks: customers, products, assets, events. Data as a product, not just a dump from a system.
Third, making pragmatic technology choices. It’s easy to chase the latest tool or architectural pattern. Charlie doesn’t have that luxury. Every decision — warehouse, lake, lakehouse, stream processor, orchestration engine — has to be weighed against cost, performance, operational reality, and whether it will actually play nicely with the rest of the stack.
Fourth, making governance real. Data quality isn’t a policy document; it lives in the pipelines Charlie builds and operates. Validation checks, lineage tracking, access controls, schema versioning — this is where the business’s aspiration for a “single source of truth” either becomes reality or stays a slide on a deck.
And through all of it, Charlie collaborates — with architects who set direction, analysts who know what business users need, scientists who know what their models require, and stakeholders who own the outcomes. When that collaboration works, projects move from proof of concept to production. When it doesn’t, you get dashboards nobody trusts and models that never leave the lab.
Why this matters
Understanding Charlie’s world reframes what a data platform needs to be. It’s not just storage and compute. It’s the foundation that keeps pipelines reliable, helps teams serve analysts and scientists faster, and makes a complex job simpler rather than more complicated.
In the next post, we’ll meet the colleagues downstream of Charlie — the data analyst and the data scientist — and see how all three fit into the same data value chain we’ve been building through this series.
Leave a Reply