The Shoebox
Before streaming, my music lived on CDs. No playlists, no algorithms, no recommendations. Just a pile of discs in a shoebox and an afternoon to fill.
Every so often I’d try to organise it. Not alphabetically — that always felt a bit lifeless — but by feel. The driving albums. The get romantic albums. The dance in the kitchen albums when no ones home.
Nobody told me what those categories were. There were no labels to copy, no genres printed on the spine that captured what I actually meant. I just listened, paid attention, and started to notice what belonged together.
Without knowing it, I was doing the analogue version of something machines now do at industrial scale.
It’s called unsupervised learning. And it might be the most quietly remarkable thing in the whole machine learning story.
Pull the Labels Off
In Part 2, we explored supervised learning — where a model studies thousands of labelled examples, finds the pattern connecting inputs to outputs, and uses it to predict something new. The label is the teacher. The cheat sheet is always there.
Unsupervised learning pulls the cheat sheet away entirely.
The data arrives with no correct answers attached. No column marked “right” or “wrong.” No target to aim for. Just raw information and the instruction: see what you can find.
This is where things get genuinely interesting. Because some of the most valuable patterns in the world haven’t been labelled yet — not because nobody cared, but because nobody knew they were there.
Two Ways of Seeing in the Dark
Unsupervised learning tends to show up in two broad forms.
The first is clustering — the shoebox problem. You’re not told what the groups should be. You look at what’s similar and let the structure emerge. Machines do this with customers, documents, transactions, sensor readings — treating each example as a point in space and finding the natural islands of similarity that form within it. The algorithm finds the groups. You decide what they mean.
The second is dimensionality reduction — a way of finding what’s really going on underneath data that looks impossibly complex. Real-world datasets can have hundreds of variables. Dimensionality reduction compresses them down into a smaller number of hidden patterns — the underlying forces that explain most of what’s happening. In music data those forces might feel like energy, mood, or tempo. In customer data they might feel like loyalty, price sensitivity, or seasonality. Nobody defines those dimensions in advance. The data reveals them.
In both cases, the machine finds the structure. We supply the language.
From Shoebox to Segments
Step out of the shoebox and into something closer to the real world: a retailer with fifty thousand customers and no idea how to talk to them differently.
The data is all there — purchase frequency, spend, product categories, response to promotions. What isn’t there is a neat label saying budget buyer or loyal advocate or one-off bargain hunter. Those categories, if they exist at all, live only in someone’s intuition.
A clustering algorithm doesn’t need the labels. It looks at how customers behave, measures the distances between them in that behaviour space, and finds the natural groupings that form. What comes out the other side isn’t a category from a textbook — it’s something that emerged from the data itself.
These customers buy rarely but spend significantly when they do. These buy constantly but only when there’s a promotion. These arrived in a burst eighteen months ago and then quietly disappeared.
Nobody wrote those segments. The data found them. And once you can see them, you can act on them — which is exactly the kind of insight that scales from a single analyst’s instinct to an enterprise decision engine. (The infrastructure that makes that scale possible is something we explore in our [Data and AI series].)
Learning Without a Score
With supervised learning, the feedback loop is clean. Predict. Compare to the label. Adjust. The model always knows how wrong it was.
In unsupervised learning there is no label, so there is no score in that sense. The algorithm isn’t trying to match a known answer — it’s trying to find the most meaningful structure it can, given only the data in front of it.
That shifts what “doing well” means. A good clustering result isn’t one that matches a predefined answer. It’s one where the groups are genuinely distinct, internally coherent, and — crucially — useful to the people working with them. An elegant algorithm that produces segments nobody can act on has failed, regardless of its technical score.
This is why unsupervised learning asks more of the humans involved, not less. The machine surfaces the pattern. The judgement about whether that pattern matters still belongs to us.
When It Goes Wrong
Unsupervised learning can go wrong in ways that are subtler than supervised learning — and sometimes harder to catch.
Given enough freedom, almost any dataset can be divided into clusters. That doesn’t mean those clusters are real. The most dangerous outcome isn’t a model that fails visibly — it’s one that produces beautiful, confident-looking segments that collapse the moment someone who knows the business actually looks at them.
There’s also the temptation to storytell on top of weak patterns. Once you have groups, it’s very easy to stand back and say: ah yes, these are clearly our innovators, these are our laggards. Sometimes that’s true. Sometimes it’s one variable you forgot to account for, dressed up in a compelling narrative.
And the ethical dimension doesn’t disappear just because the labels do. If certain groups are underrepresented in your data, the patterns that emerge will reflect that absence — quietly, confidently, and without a warning sign. As we explored in Part 1 and Part 2, the quality of data is never just a technical question.
Siblings, Not Rivals
It’s tempting to think of supervised and unsupervised learning as separate worlds. One with labels, one without. One for prediction, one for exploration.
In practice they constantly work together. Unsupervised methods often do the groundwork — simplifying, exploring, revealing structure — that makes a supervised model possible. Supervised labels can be tested against unsupervised findings to check whether what you think the categories are matches what the data actually shows.
Miss Smith’s straight line and the shoebox of CDs aren’t different stories. They’re different chapters of the same one.
Closer Than You Think
When I was sorting that shoebox, I wasn’t thinking about algorithms. I was just following my ears — noticing what felt similar, grouping it, deciding what the groups meant.
Today, unsupervised learning does that for millions of customers, billions of web events, and oceans of sensor data that no human could sort by feel. No labels. No cheat sheet. Just structure, pulled out of the noise.
In the next post, we’ll look at reinforcement learning using a computer game analogy.
Because that’s when it stops being interesting and starts being important.

Leave a Reply