For most of software’s history, the intelligence was in the code. Today, it’s in the data. That shift changes everything — especially what you need to invest in.

Infographic comparing traditional programming and machine learning. On the left, traditional programming is depicted with a flowchart showing 'if/then' statements leading to a predetermined result. On the right, machine learning is illustrated with a data funnel leading to a network discovering patterns, highlighting the shift in the programmer's role from coding to curating data.

I had one of those shower moments this morning. You know the ones — your brain wanders somewhere unexpected and suddenly you’re solving a problem you weren’t trying to solve.

I was thinking about the time I taught my son to code a robot we’d built together. The code was beautifully simple. Turn left. Move three steps. Turn right. If you hit a wall, stop. Pure logic. Pure rules. We wrote the instructions, the robot followed them, and when it did something wrong we went back and fixed the rule.

And then I thought: when I’m a grandad — no rush — and I’m sitting down with a grandchild to do the same thing, the conversation is going to look completely different. We won’t be writing turn-left-move-three-steps rules. We’ll be feeding the robot data. We’ll be talking about what it sees, the patterns it learns from, how it gets better not because we updated the instructions but because we gave it more examples to learn from. Computer vision. Convolutional neural networks. A robot that figures out the world rather than following a script we wrote for it.

Same robot. Completely different philosophy. And somewhere between teaching my son and the future grandkids, software itself made that same journey.

For most of computing’s history, we built systems by encoding our understanding of the world directly into logic. If this, then that. If the balance is below zero, deny the transaction. If the email contains “free money” and ten exclamation marks, mark it as spam. Engineers wrote the rules, shipped the code, and the system behaved exactly as specified. The intelligence lived in the logic.

That model hasn’t disappeared — but in the domains that matter most today, it’s no longer the whole story. It’s all about data now. Patterns in the data. And understanding that shift changes how you think about what you need to invest in.


“Software will eat the world,” Marc Andreessen told us in 2011. He was right. What he didn’t mention was that software itself would eventually be powered by data. Follow that thought to its conclusion and the most important infrastructure in your organisation isn’t your application stack. It’s your data platform. And data is the power source.

When Rules Stop Being Enough

Rules-based systems are genuinely good at what they do. They’re predictable. They’re auditable. If something goes wrong, you can usually point at the line of code that caused it. For stable, well-understood processes — tax calculations, eligibility checks, simple approvals — they’re entirely fit for purpose.

The trouble starts when the problem gets messy.

Take fraud detection. You start sensibly: flag transactions above a certain amount from high-risk locations. Block IPs on a denylist. Limit transactions per minute. Clean, logical, explainable.

Then the fraudsters adapt. New attack vectors. New geographies. New patterns you didn’t anticipate. So you add more rules. Then exceptions to those rules. Then special handling for VIPs. Then manual overrides for partners. Before long, you’ve got thousands of conditions, constant firefighting, and a system that’s simultaneously brittle and impossible to fully understand — despite being built entirely from logic you wrote yourself.

At some point you hit rule sprawl, and it doesn’t end well.


The Shift: From Code as Truth to Data as Truth

Machine learning doesn’t try to specify the decision logic. Instead, it learns it — directly from examples.

Feed a model enough confirmed fraud cases alongside confirmed legitimate transactions. Give it the signals: transaction history, device fingerprint, location patterns, time of day, merchant data. Let it find the patterns. Then, when the fraud landscape shifts, you don’t sit down and rewrite hundreds of rules. You gather new examples, update the signals, retrain, and redeploy.

This is a fundamental inversion of where the intelligence lives:

  • In a rules-based system, code is the truth and data is just something to test against it.
  • In a machine learning system, data is the truth and code is the plumbing that carries it.

The code still matters enormously — it defines how data flows, how features are built, how models are trained and served. But the behaviour you see in production is now overwhelmingly a function of which data you chose, how you cleaned and joined it, how frequently you refresh it, and how well you’ve engineered the signals from it.

The same model architecture, trained on different data, can behave like a completely different product.


Output Is a Function of Input

This is the point that gets lost in conversations about AI.

Organisations invest heavily in models. They debate architectures, benchmark performance, evaluate vendors. All of that matters. But if the data flowing into those models is incomplete, inconsistent, biased or stale, no amount of model sophistication will save you.

As I covered in Garbage In, Expensive Garbage Out, the dangerous thing about modern AI isn’t that it fails obviously when the data is bad. It’s that it doesn’t. It learns whatever patterns you give it, optimises confidently for whatever labels you’ve defined, and delivers outputs at scale — even when those outputs are wrong.

The refinery metaphor runs true here. You can have the most sophisticated downstream process in the world. If contaminated feedstock is getting through the early stages, it doesn’t matter how good the refining is — what comes out the other end is still wrong. Processed wrong. Delivered at scale, with complete confidence, in entirely the wrong direction.

Output is a direct function of input. That’s not a caveat. It’s the whole game.


Data Engineering: The Refinery That Makes AI Possible

This is why data engineering has moved from the back office to the front line.

When your AI systems run on data rather than rules, the infrastructure that produces, transforms, governs and delivers that data isn’t supporting the product. It is the product — or at least, it’s what makes the product possible.

Think back to the refinery. Raw crude oil has no value in your car’s engine. It needs to go through a series of deliberate transformation stages — each one removing impurities, each one producing something more usable — before it becomes fuel you can rely on. Data works the same way.

Raw operational data, logs, clickstreams, sensor readings — these are the crude oil. Valuable in potential, useless in practice. To become model-ready, they need to flow through robust pipelines: ingested reliably, cleaned and validated, standardised across sources, transformed into features that actually capture signal, and governed throughout so you know what you have, where it came from, and whether it can be trusted.

That’s the job of the data engineer. And in a world where AI output depends on data input, that job sits at the heart of everything.

A few things have to be true for the refinery to work:

The pipelines have to be reliable. Ingestion from operational systems, logs, events and sensors. Batch and streaming paths where appropriate. Resilience to schema changes, late events and upstream failures. Without this, models starve, drift, or silently degrade on stale inputs.

The data has to be properly modelled. Standardised schemas and clear contracts between the systems that produce data and the teams that consume it. Deduplication, validation and anomaly detection built into the pipeline, not bolted on as an afterthought. Consistent definitions of what “customer” means, what “churn” means, what “conversion” means — because if those definitions vary across systems, your model is quietly learning the noise between them.

Features need to be treated as first-class assets. The signals you engineer from raw data — the features a model actually learns from — should be reusable, versioned and governed. Computed consistently whether you’re training offline or serving in real time. Not scattered across one-off notebook scripts that no one else can maintain.

Governance can’t be an afterthought. As AI moves closer to consequential decisions — credit, healthcare, hiring, public sector — knowing which data fed which model, who had access to it, and whether it was fit for that purpose stops being a compliance tick-box and becomes part of the safety story.

The loop has to close. How you capture feedback from production — user interactions, implicit signals, explicit labels — and turn it into the next generation of training data is where the compounding advantage comes from. The refinery doesn’t run once. It runs continuously.


Generative AI Turns the Dial Up, Not Off

It’s tempting to think that large language models and generative AI change this equation — that you can just point a capable model at your questions and bypass the data engineering work.

The opposite is true.

Behind every enterprise generative AI application that actually works, there are pipelines fetching the right context from your knowledge bases and data warehouses in real time. There are curated fine-tuning datasets steering the model toward the behaviour you actually want. There are feedback loops turning user interactions into better training data over time. There is, in short, a refinery — just with a different interface at the end of it.

For enterprise use cases, the differentiator is rarely the base model. It’s the quality of the data you connect it to, the rigour of the retrieval and ranking pipelines behind it, and the discipline of the data engineering that makes all of that reliable.

The plumbing is still the point.


If Data Is the Engine, Build the Right Infrastructure

The organisations that are winning with AI aren’t simply the ones with the biggest models. They’re the ones who treat data engineering as a first-class product capability — where data engineers and platform architects are in the room from the start, not brought in to implement decisions that have already been made.

They invest early in shared platform infrastructure: data lakes and warehouses, feature stores, catalogues, quality monitoring, governance and observability. Not one-off pipelines per project, but a proper refinery that serves the whole organisation.

And they build on foundations that can handle the scale and complexity of real enterprise data estates — structured tables alongside documents, images, logs and sensor data; on-premises alongside cloud and edge; batch pipelines alongside real-time streams.

That’s exactly what the Dell AI Data Platform is designed to support: a unified, modular foundation for storing, processing, governing and serving the data that modern AI workloads depend on — so data engineers can focus on building the refinery, rather than firefighting the infrastructure it sits on.


The Refinery Has to Work

The shift from rules-based systems to data-driven AI didn’t just give us more powerful software. It changed where the intelligence lives — and with it, what we need to invest in to make that software trustworthy.

When code was the truth, the bottleneck was engineers writing rules. When data is the truth, the bottleneck is the infrastructure that produces, refines, governs and delivers that data.

The refinery has to work. The pipelines have to be reliable. The fuel has to be clean. Everything downstream — every model, every decision, every output — depends on it.

And if you want to understand what happens when the refinery fails, that’s a story worth reading too.

Leave a Reply