K-Means forced everyone into a box. DBSCAN found the whales hiding in the corner.
The setup
Tindo is a marketplace for handmade goods, with hundreds of thousands of buyers and behavior all over the map, from once-a-year gift shoppers to people who treat it like a weekly habit. Raj leads growth and had been segmenting buyers with K-Means for a year. It was fine. It was also, he’d start to suspect, hiding things.
The problem
K-Means has a quiet flaw: it carves your customers into round, similar-sized blobs whether your data is shaped that way or not. Every buyer gets forced into the nearest segment, and the weird, interesting groups, the ones that don’t fit a neat sphere, get smeared into their bigger neighbors.
Raj’s gut said Tindo had a small population of extreme buyers: a handful of people spending orders of magnitude more than anyone else. But K-Means kept folding them into the “high-value” blob alongside merely-good customers, diluting both. “I knew there were whales in there,” he said. “My segments just refused to show them to me as their own thing.”
The turning point
Raj switched on DBSCAN, Divisio’s density-based clustering. Unlike K-Means, DBSCAN doesn’t ask “how many segments?” and doesn’t force every point into one. It finds clusters of any shape by looking for dense regions, and it’s allowed to label the genuinely unusual points as outliers instead of cramming them somewhere they don’t belong.
The result reshaped how Raj saw his buyers. DBSCAN surfaced a dense, oddly-shaped core of habitual buyers K-Means had split awkwardly in two, and, critically, it isolated a tiny outlier group: a few hundred whales with spend patterns nothing like anyone else’s. It also flagged a separate cluster of high-velocity, low-diversity accounts that looked less like shoppers and more like resellers gaming promo codes.
He confirmed it with deep-dive charts. A scatter of spend vs. order count showed the whales sitting completely off on their own, exactly where a round segment would never have caught them.
How he did it
- Loaded his buyer behavior export (spend, frequency, basket diversity, promo usage).
- Switched the algorithm from K-Means to DBSCAN, with no need to preset a segment count.
- Let it find dense clusters and flag outliers rather than forcing every buyer into a group.
- Validated the whale and reseller groups with scatter and box-plot deep-dive charts.
- Split the outlier whales into their own audience and quarantined the suspected resellers.
The payoff
Those few hundred whales got their own VIP program (early access, a concierge contact, no discount spam), and that micro-segment, once invisible, turned into an outsized share of retained revenue. Pulling the suspected resellers out of his “best customers” audience also stopped Raj from wasting loyalty perks on people exploiting promos.
The lesson stuck: his data was never round, so a round-by-design algorithm was always going to lie to him a little.
“K-Means gave me tidy segments that smeared my best customers into the crowd. DBSCAN let the whales be whales, and let the weird stuff be flagged as weird instead of pretending it belonged.”
Raj, growth lead at Tindo
Feature spotlight: Advanced clustering / DBSCAN (Pro)
Density-based clustering for data that isn’t tidy. DBSCAN finds segments of any shape without you guessing how many there are, and it’s allowed to flag genuine outliers instead of forcing every record into the nearest blob. When your most interesting customers are the ones who don’t fit the mold, this is the algorithm that actually shows them to you.